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1. Introduction 



tbd 



2, Video Object Plane (VOP) 



2.1 VOP Definition 



The Video Object Planes (VOP) correspond to entities in the bitstream that the user can access , and 
manipulate (cut, paste...)- The encoder sends together with the VOP, composition information to indicate 
where and when each VOP is to be displayed, At the decoder side the user may be allowed to change the 
composition of the scene displayed by interacting on the composition information. 



At the encoder: 
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At the decoder: 
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Figure 2.1.1: VM Encoder and Decoder Structure 



The VOP can be a semantic object in the scene : it is made of Y, U, V components plus shape information. 
In MPEG-4 video test sequences, the VOP were either known by construction of the sequences (hybrid 
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sequences based on blue screen composition or synthetic sequences) or were defined by semi-automatic 
segmentation. In the first case, the shape information is represented by an 8 bit component, used for 
composition (see section 5.2). In the second case, the shape is a binary mask. Both cases are currently 
considered in the encoding process. The VOP can have arbitrary shape. 

The exact method used to produce the VOP from the video sequences is not described in this document. 

When the sequence has only one rectangular VOP of fixed size displayed at fixed interval, it corresponds to 
the frame-based coding technique. 



2.2 VOP format 



This section describes the input library, the filtering process and the formation of the VOP.. 

Section 2.2.1 describes the test sequences library. Section 2.2.2 describes the suggested downsampling 
process from ITU-R 601 format to SIF, CIF and QC1F formats. In this section, the acronym S1F is used to 
designate the 352x240 and 352x288 formats at 30 Hz and 25 Hz, respectively, while CIF designates only 
the 352x288 format at 30 Hz. Section 2.2.3 describes the VOP format. 



2.2.1 Test sequences library 

All the test sequences will be available in either 50 Hz or 60 Hz ITU-R 601 formats. The input library from 
the November *95 and January '96 test was adopted here. As the VM evolves it is expected that more 
representative sets of input source will become available. The distributed files format for the input sources 
are as follows: 

1) Luminance and chrominance (YUV) - ITU-R 60) format containing luminance and chrominance data 

• one or more file per sequence; 

• no headers 

• supply number of files and size in separate README file 

• chain all frames without gaps 

• for each frame, chain Y, U, V data without gaps 

• write component data from 1st line, 1st pixel, from left to right, top to bottom, down to last line, last 
pixel. 

2) Segmentation Masks • The format for the exchange of the mask information is similar to the one used 
for the images, i.e. a segmentation mask has a format similar to ITU-R 601 luminance, where each pixel has 
o label identifying the region it belongs to (label values are 0,1,2, ...). A segmentation may have a 
maximum of 256 segments (regions). Whenever possible, the segments should have a semantic meaning and 
will correspond to the VOP. 

3) Grey Scale Alpha Plane files - ITU-R 601 format - containing the alpha values. The same format as 
the ITU-R 601 luminance file is used. All values between 0 and 255 may be used. For the layered 
representation of a sequence, each layer has its own YUV and alpha files. 

The test sequences library is separated into the following classes: 

Class A: Low spatial detail and low amount of movement 

Class B: Medium spatial detail and low amount of movement or vice versa 

Class C: High spatial detail and medium amount of movement or vice versa 
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Class D: Stereoscopic 

Class E: Hybrid natural and synthetic 



The following table lists the input sequences, their format and the available files. 



Sequence Name 


Class 


Input Format 


YUV 
files 


Alpha 
files 


Segment 
Mask 

A 'ill 

Available 


Mother & daughter 


A 

A 


ITU-R oOl (60Hz) 


1 


0 


0 


Akiyo 


A 


ITU-R 601 (60Hz) 


2+1 


I 


2 


Hall Monitor 


A 

A 


ITU-K 601 (oOHz) 


1 


0 


3 


Container Ship 


A 


J 1 U-K OUJ \\3\)r\Z) 


] 


A 
0 


O 


Sean 


A 

A 


1X1 1 D Afll /AAUit 
1 1 U-K QUI tOUnzj 


i 
J 


ft 

0 


3 


Foreman 


B 


ITU-R 601 (50Hz) 


1 


0 


0 


News 


B 


ITU-R 601 (60Hz) 


4+1 


3 


4 


Silent Voice 


B 


ITU-R 601 (50Hz) 


I 


0 


0 


Coastguard 


r> 
D 


1 1 U-K oOl (oOHz) 


I 


0 


4 


Bus 


C 


ITU-R 601 (60Hz) 


1 


0 


0 


Table Tennis 


C 


ITU-R 601 (50Hz) 


1 


0 


0 


Stefan 


C 


ITU-R 601 (60Hz) 


I 


0 


2 


Mobile & Calendar 


V_ 


1X11 D fJ\\ ffXM-It} 
J 1 U-K OUI ^OUnZ; 


J 


ft 

u 


u 


Tunnel 


D 


ITU-R 601 (50Hz) 


2x1 


0 


0 


Fun Fair 


D 


ITU-R 601 <50Hz) 


2x1 


0 


0 


Children 


E 


ITU-R 601 (60Hz) 


3+1 


2 


3 


Bream 


E 


ITU-R 601 (60Hz) 


3+1 


2 


3 


Weather 


E 


ITU-R 601 (60Hz) 


2+1 


I 


2 


Destruction 


E 


ITU-R 601 (60Hz) 


11 + ) 


10 


0 



Table 2.2.1 Lists of input library files 

Note: N+l indicates that the sequence consists of N layers and the composed sequence. 
Nx 1 indicates the the sequences consists of N views. 

2,2.2 Filtering process 

The filtering process for YUV is based on the document (MPEG95/0322) . the filtering process for alpha 
planes (A) is based on the document [MPEG95/0393]. Software for performing the filtering process was 
distributed and can also be obtained from MPEG/rp site 'drop.chips.ibm.com:Tampere/Contrib/m0896.zip t . 

In the first step, the first field of a picture is omitted (both luminance and chrominance). Then the physically 
centred 704x240/288 and 352x240/288 pixels are extracted. This format is used to create all the smaller 
formats using the filters listed in table 2.2.2 and following the steps described below. 
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Factor 


Tap 
no. 


Filter taps 


Divisor 


A 


1/2 


1 


5.11.11.5 


32 


B 


1/2 


1 


2.0,-4.-3.5.19.26.19,5,-3.-4,0.2 


64 


C 


1/4 


1 


-5.-4 .0.5. 1 2, 1 9.24.26.24. 19. 1 2.5.0.-4.-5 


128 


D 


6/5 


1 


-16.22.116.22,-16 


128 






2 


-23,40,110,1 


128 






3 


-24,63,100,-11 


128 






4 


-20,84,84,-20 


1 OO 

128 






c 

J 


-J 1 ,IUU,O.J,-Z'» 


1 






6 


1,110.40,-23 


128 


E 


3/5 


1 


-24.-9.88.146.88.-9.-24 


256 






2 


-28,17.118.137,53,-26,-15 


256 






3 


-15,-26,53.137,118,17,-28 


256 


F 


1/2 


1 


-12,0, 140, 256. 140.0.-12 


512 



Table 2.2.2: Filter taps for downsampling 



ITU-R 601 toCIF/SIF 
For Y 

704x240 - B -> 352x240 - D -> 352x288 
704x288 - B -> 352x288 

For U and V 

352x240 - B -> 1 76x240 - D -> 1 76x288 - A -> 1 76x 1 44 
352x288 - B -> 176x288 - A -> 176x144 

For A 

■ 704x240 - F -> 352x240 - D -> 352x288 
704x288 - F -> 352x288 

ITU-R 601 to OCIF 

For Y and A 

704x240 - C -> 176x240 - E -> 176x144 
704x288 - C -> 176x288 - B -> 176x144 

For U and V 

352x240 - C -> 88x240 - E -> 88x 1 44 - A -> 88x72 
352x288 - C -> 88x288 - B -> 88x144 - A -> 88x72 

The resulting position of the chrominance relative to the luminance is as follows : 



x x x x 

o o 
x x x x 




where x : luminance, o : chrominance 



Figure 2.2.1 Position of chrominance samples after filtering 

Notes: 

The 4:2:2 to 4:2:0 conversion is done in the last step because then the correct position of the chroma 
samples can be preserved. 

For input sequences in 4:2:0 format a conversion from 4:2:0 to 4:2:2 is performed before the filtering 
process starts. The interpolation filter is (1 ,3,3 J) as specified in document [WG1 1/N0999). 

Filtering of border pixels: When some of the filter taps fall outside the active picture area then the edge 
pixel is repeated into the blanking area. 

Processing of prey scale alpha planes 

The downsampling process for alpha planes is the same as for luminance (Y). However, for alpha planes a 
different filter is used for horizontal 2-to-l filtering. This filter preserves more the high frequency band and 
therefore maintains a sharp edge for alpha planes. 

For the grey scale alpha planes in Class E sequences all the values below a certain threshold are set to 0. 
The following threshold values are recommended: 



Sequence 


VOP 


Name 


Threshold 


children 


VOP0 


children_0 






VOPI 


children^] 


64 




VOP2 


children_2 


64 


weather 


VOP0 


weather_0 






VOPI 


weather 1 


64 


bream 


VOP0 


bream_0 






VOPI 


bream_ 1 


64 




VOP2 


bream„2 


64 


destruction 


VOP0 


destruction_iO 






VOPI 


destruction^! 


64 




VOP2 


destruction^ 


64 




VOP3 


destruction^ 


64 




VOP4 


deslruction_4 


64 




VOP5 


deslruction_5 


64 




VOP6 


destruction_6 


64 



8 





V0P7 


destruction_7 


64 




V0P8 


destiuction_8 


32 * 




V0P9 


dcsiRiction_9 


64 




VOP10 


destnjction_lO. 


64 



Table 2.2.3. Threshold values for Class E sequences . 



Processing of segmentation mask 

The segmentation mask is first converted to binary alpha planes. 

An object can occupy one or more segments in the segmentation mask. The binary shape information is set 
to "255' for all pixels that have the label values of the selected segments. All other pixels are considered 
outside the object and are given a value of X)'. 

The downsampling process for the binary alpha plane follows that of the grey scale alpha planes. A 
threshold of 128 is selected. All filtered values below this threshold are set to X)', whereas all filtered values 
above the threshold are set to "255'. 



2.2.3. VOP file format 

The following is the VOP file format. Each VOP consists of a down sampled Y,U and V data file and the 
alpha plane as specified in section 2.2.2. For simplicity the same alpha file format is used for binary as well 
as grey scale shape information. For binary shape information the value of 0 is used to indicate a pixel 
outside of the object and the value of 255 is used to indicate a pixel inside the object. For grey scale shape 
information the whole range of values between 0 and 255 is used. VOPO is a special case where the alpha 
values are all 255. The blending operation is described in section 5.2. 

2.2.4. Coding of test sequences whose width and height are not integral multiples of 16 

In order to code test sequences whose width and height are not integer multiples of 16 (macroblock size), 
the width and height of these sequences are first extended to be the smallest integral multiples of 16. The 
extended areas of the images are then padded using a repetetive padding technique described in 3.3.1 . 

3. Encoder Definition 



3.1 Overview 

The Figure 3.1.1 presents a general overview of the VOP encoder structure. The same encoding scheme is 
applied when coding all the VOPs of a given session. 
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Figure 3.1.1 : VOP encoder structure. 

The encoder is mainly componed of two parts : the shape coder and the traditional motion & texture coder 
applied to the same VOP. The VOP is represented by means of a bounding rectangle as described further. 
The phase between the luminance and chrominance samples of the bounding rectagle has to be correctly set 
according to the 4:2:0 format, as shown in Figure 3.1.2. Specifically the top left coordinate of the bounding 
rectangle should be rounded to the nearest even number not greater than the top left coordinates of the 
tightest rectangle. Accordingly, the top left coordinate of the bounding rectangle in the chrominace 
component is that of the luminance divided by two. 
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X X 

X X 

o 

X X 
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Bounding 

X rectangle 



X 



X 



X luminance 
O chrominance 

Figure 3.1.2 : Luninance versus chrominance bounding box positionning 
3.2 Shape Coding 

This section describes the coding methods for binary and grey scale shape information. The shape 
information is hereafter referred to as alpha planes. Binary alpha planes are encoded by modified MMR 
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while grey scale alpha planes are encoded by quadtree with vector quantization. An alpha plane is bounded 
by a rectangle that includes the shape of a VOP, as described in section 3.1. The bounding rectangle of the 
VOP is then extended on the right-bottom side to multiples of 16x16 blocks. The extended alpha samples 
are set to zero. The extended alpha plane is partitioned into blocks of 16x16 samples (hereafter referred to 
as alpha blocks) and the encoding/decoding process is done per alpha block. 

If the pixels in a macroblock are all transparent (all zero), the macroblock is skipped before motion and/or 
texture coding. No overhead is required to indicate this mode since this transparency information can be 
obtained from shape coding. This skipping applies to all I-, P-. and B-VOP's. 



3.2.1 Binary Shape Coding' Modified MMR 
3.2.1.1 Overview 

(1) The bounding rectangle of the VOP is extended on the right-bottom side to multiples of 16x16 blocks. 
The position and the size of the rectangle are coded by VOP„horizontal_mc_spatial_ref, 
VOP_vertical_mc_spatial„ref, VOP_width, and VOPJieight. 

(2) Basically, MMR coding is disposed by detecting the color-changing pixel (from black(Object) to 
white( Background) or from white to black), and calculating the distance between the current changing 
pixel and the previous changing pixel. 

(3) If all pixels in a MB are white (=A11 white MB) or black (=AII black MB), MMR coding is not carried 
out. 

(4) -"Fig.M I shows an example. 

(4-1) The color of the first pixel (the top-left pixel) in a MB (=aO_color) is coded, using Ibit 
(black :aO_color=l , white:aO_color=0). 

(4-2) If the changing pixel is in the top line in a MB, the bottom line of the reconstructed upper MB is used 
as a reference area (Top reference in Fig.M 1), and if the changing pixel is in the ieft-end column in a MB, 
the right-end column of the reconstructed previous MB is used as a reference area (Left reference in 
Fig.M 1). In other cases, the reference area is defined as previous N pixels (N means the number of pixels in 
one pixel line of a MB) including the changing pixel (see Fig.M2(b)). Therefore, black dots (•) in Fig.M I 
are regarded as changing pixels. 

(4-3) If a MB is in a top MB line or in a left-side MB column in a frame, the colors of Top reference or Left 
reference are conveniently set to "white". 

(4-4) The distance between the current changing pixel and the previous changing pixel is coded. There are 
three modes in this algorithm (refer 3.2.1.3 for detail) : Horizontal mode, Vertical Pass mode, and Vertical 
mode. 

(4-5) Shape information of the original size can be size-converted for rate control and rate reduction. The 
conversion ratios (CR) proposed here are I (original size), 1/2, and 1/4. The CR information is VLC coded 
(see Table Ml). 



II 




J 




Top reference 



.t 




# : changing pixel 



Fig.MJ 



Changing pixel and reference area 



3.2.1,2 Detailed explanation 

Fig.M2 shows an example of proposed modified MMR. 
aO: The starling pixel of the coding disposition, 
al : The next changing pixel to aO. 

bl: ]f bl is on the rl of the reference area (see Fig.M2(b)) f it means that bl is the first color-changing 
pixel and the color is opposed to aO. If b] is on the r2 of the reference area (see Fig.M2(b)), it means that 
bl is the first color-changing pixel from Left reference. 

The relative addresses of aO, al, and bl can be calculated as follows. Here r_X means the relative address 
of pixel X (X=aO ( al, bl), and abs_X means the absolute address of pixel X counting from the top left of a 
MB. Furthermore, WIDTH means the number of pixels in one pixel line of a MB. 

(1) r_aO= abs_aO-(int)(abs_aOAV]DTH)xWIDTH 

(2) r_al= abs_al-(int)(abs_aOAVIDTH)x WIDTH 

Detailed detection algorithm for al is described in section 3.2.14 

(3) r_bl = abs_bI-((int)(abs_aO/W]DTH)-l)x WIDTH 
Detailed detection algorithm for bl is described in section 3.2.1.4 



12 




(a) (b) 

Fig.M2 Modified MMR 



3 .2.1.3 Three modes of modified MMR 

3.2.1 .3.1 Horizontal mode 

Rg.M3 (a) shows an example of this mode. 

• In this case, the information of ULB(Unchanged-length of binary information =abs_al-abs_a0 : here, 
the range of ULB is set from 1 to WIDTH) is coded. Actually, the value of (ULB -I) is coded using 
fixed-length code word (if WIDTH is 16, four bits are used, and if WIDTH is 8 or 4, three or two bits . 
are used, respectively). Therefore, no table is needed. 

• Before Horizontal mode starts, Horizontal mode flag H(=001 : see Table M2) is sent first. 

• After the flag H, ULB code is sent. 

• Since maximum value of Unchanged- length is WIDTH, Vertical pass mode is used if Unchanged-length 
is longer than WIDTH (see next section and Fig.M4). 

3.2.1.3.2 Vertical Pass mode 

Vertical Pass mode is used when the Unchanged-length becomes longer than WIDTH. 

• Fig.M4 shows an example of output code words in the case of Vertical Pass mode. From aO to cO in 
Fig.M4 (a), the colors of pixels are the same ("white" in this case). Therefore, this part becomes 
Horizontal mode (H), and Unchanged-length of binary information(ULB) is WIDTH. But since the 
colors of pixels in the rest part of this pixel line are also "while", Vertical Pass mode can be used. 

• The code word WIDTH is used as the changing signal from Horizontal mode to Vertical Pass mode 
(VerticaLpass_mode nag is set to TRUE). Usually, VO means DIST=0 of Vertical mode (see Table 
M2), but VOs after WIDTH mean 'This line is Vertical Pass mode". 
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• In the same way, next two pixel lines (the third and the fourth pixel lines in Fig.M4(a)) are coded as 
Vertical Pass mode. Even if the color of the line is changed as shown in Fig.M4 (a) (the third line), this 
line also becomes Vertical Pass mode because there is no change from Left reference. 

• The next line (the fifth line) before a I becomes Horizontal mode again. Therefore, abs_al-abs_cl is 
coded using Horizontal mode (cl means the first pixel of this line). This information is called RLB 
(=Residual-length of binary information), and the range is set from 0 to WIDTH-] . In this case, RLB 
code is sent after the flag H, and Vertical_pass_mode is reset (=FALSE). 

• Fig.M4(b) is an example that Vertical Pass Mode can be used from the beginning of a MB. That is, in 
the case that aO is the first pixel of a MB and there is no bl in the reference area, the mode becomes 
Vertical Pass mode (Vertical_pass_mode=TRUE). To the contrary, if bl is found in the reference area, 
the mode becomes Horizontal mode or Vertical mode (Vertical j>ass_mode=:FALSE). 




(a) Horizontal mode ( b ) Vertical mode 

Fig.M3 Horizontal mode and Vertical mode 
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(b) 

Fig.M4 Vertical Pass mode 



3.2.1.33 Vertical mode 



Fig.M3 (b) shows an example of this mode. When this mode is identified, the position of al is coded 
relative to the position of bl (absolute value of r_al - r_bl =DIST in Table M2). 



3.2.1.3.4 End of MB 

After the last change of color, End of MB code (EOMB = '0001* : see Table M2) is sent, and the coding 
disposition for the MB is finished. 



3.2.1.4 Flowchart of encoder 

Fig.M5 shows the flow chart of the coding algorithm, 
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Fig,M5 Flowchart of encoder 



The detailed processes of "DETECT al" and "DETECT bl" in Fig.M5 are as follows (C program): 
(1) DETECT al 



if (i >= total_number_of_pixels) i = -1 ; 
) while (i > 0); 

i 

if (i = - I) { (al is not detected) } 
else { (al is detected) 

r_al = abs_al - aOJine*WIDTH; 

] 



(2) DETECT bl 

int i, j, line, aOJine; 
i = abs_a0 + 1 - WIDTH; 
aOJine = abs_aO/WIDTH; 

j=i; 

do { 

line = i/WIDTH; 
if( line = aOJine ) pred_color s= left_reference[aOJineJ; 
else pred.color = pixel f abs_a0] ; 

if(i<0){ 

if (top_reference[i+W]DTH] != pixel[abs_aO] && 
top_reference[i+WIDTH-l] != top_reference[i+ WIDTH]) 
{abs_bl=i; j = 0;} 
} else { 

if (i%WIDTH ~ 0 && pixelfi] != pred_color && left_referencefline] != pixel[i]) 

{ abs_bl = i; j = 0; } 
else if (pixel[i] != pred.color && pixel[i-l] != pixelfi]) 



int i, line, aOJine; 
i - abs_a0 + 1 ; 

aOJine = absaO/WlDTH; 

if (i >= totaLnumber_of_pixels) { 

} else { 
do { 



line = i/WIDTH; 



if( line == a0_line ) pred^color - pixel [abs_a0]; 
else pred_color= left_reference[line]; 

if ( pixel[i] != pred_color ) { abs_al = i; i = 0; ) 
else J i++; 



18 



(abs_bl=i; j = 0;] 

else 

I 

if(i>abs_aO) j = 
) while (j > 0); 

if(,==.|) { (bl is not detected) ) 
else { (bl is detected) 

rj)l = absj>l - (aOJine - J)*WIDTH; 

} 

where, • , - 

pixel[z] : Color of zth pixel from the top left in MB. 
left_reference[z] : Color of zth pixel from the top in Left reference. 
top_reference[z] : Color of zth pixel from the left-end in Top reference. 
total_number_of_pixel : Total number of pixels in a MB. 

e.g. 256 ( CR ± 1 ), 64 ( CR = 1/2 ), 16 ( CR = 1/4 ). 



3.2.1.5 Binary shape coding with Motion Compensation(MC) 

3.2.1.5.1 Framework 

Alpha plane is encoded per 16x16 alpha block and syntax is modified lo encode the alpha data with the 
motion texture. 

3.2.1.5.2 Description 

An alpha plane is encoded per 16x16 alpha block. The aipha coding is strongly coupled to texture coding 
and done by not only intra frame coding but also inter frame coding with motion compensation. This 
scheme contribute to both bit reduction and low delay (enable macroblock base operation) compared to that 
described in VM 2.2. • 

The alpha block is intra or inter frame coded with motion compensation. The motion vector is the same as 
YUV macroblock and residual of the alpha block is never coded. 

Intra / inter alpha coding decision 

The motion compensation is applied to binary shape with the same scheme as the luminance macroblock 
except padding is not performed. The compensated alpha block is clamped before calculating alpha 
prediction error. The selection of intra or inter alpha coding depends on the alpha prediction error and 
intra/inter mode of YUV macroblock at the same spatial position of the alpha block. 
The alpha block is inter-coded if all the following conditions are satisfied. Otherwise it is intra-coded. 

• The 16x1 6 alpha block is divided into sixteen 4x4 alpha sub-blocks. The alpha prediction error 
of 4x4 alpha sub-block is the summation of absolute prediction error over the 4x4 alpha sub- 
block. 

Each of the alpha prediction error of 4x4 alpha sub-block is less than or equal to a pre-determined 
threshold TH^ 

• The compensated alpha block is not all 0. 
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• The compensated alpha block is not all 255 and the previous reconstructed block is all 255. 

♦ YUV macroblock is inter coded. 

Inter alpha block coding 

The motion compensation is described in previous section. No residual of the alpha block is coded, so only 
first_MMR_code is coded for the alpha block. Note that VLC of first_MMR_code of the alpha inter-coded 
is different from that of the alpha intra-coded. 

Clamping of coded alpha 

Alpha values are rounded to the nearest value 0 or 255. It eliminates intermediate values between 0 and 255 
that are produced by overlapped motion compensation. 

3.2.1.6 Size conversion (Rate control) 

Binary shape information can be size-converted for rate control and rate reduction. Therefore, if rate 
control is required, size-conversion process is carried out every MB except All white MB and All black 
MB. 

1. The conversion ratio(CR) is 1 (original size) or 1/2 or 1/4, and CR information is VLC coded (see Table 
. Ml). 

2. Fig.M6 shows the size-conversion procedure. 16x16 MB is down-sampled to 8x8 or 4x4, and up- 
sampled to 16x16 again. Down and up sampling filters are used every MB, and they are as follows 
(Here, "1" means "black", and "0" means "white"). 

[Down sampling] 

* From 1 6x 1 6 to 8x8 (from "O" to "X" in Fig.M7(a)) 

If the average of pixel values in a 2x2 pixels block is equal to or larger than 0.5, the pixel value of the 
down-sampled block is set to "J". Otherwise, the pixel value of the down-sampled block is set to "0". 
♦From 16x1 6 to 4x4 

If the average of pixel values in a 4x4 pixels block is equal to or larger than 0.5, the pixel value of the 
down-sampled block is set to "1". Otherwise, the pixel value of the down-sampled block is set to M 0". 

TlJp sampling] 

The pixel value of the up-sampled block is calculated by bilinear interpolation (from "0 M to "X" in 
Fig.M7(b)). If a bilinear-interpolated pixel value is equal to or larger than 0.5, the interpolated pixel value 
is set to M l". Otherwise, the interpolated pixel value is set to "0". In the case of interpolating MB boundary 
pixel values, the imaginary pixel whose value is equal to that of the nearest down-sampled pixel inside the 
MB is regarded to exist outside the MB (see Fig.M7(b)). 

3. For Top reference and Left reference, the following size-conversion processes (from 16 pixels to 16/N 
pixels : CR=1/N) are used (C program): 

int i, j, tmp; 

for( i=0;i< I67N; i++ ) | 
tmp = 0; 

for ( j ~ 0; j < N; j++ ) tmp += top_reference(N*i + j]; 
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top_rcfercnce_N[i] = (int)( ( tmp + N/2 ) / N ); 
imp = 0; 

for ( j = 0; j < N; j++ ) tmp += left_referencelN*i + j]; 
!eft_reference_N[i] = (im)( ( imp + N/2 ) / N ); 



where, 

left_reference[z] : Color of 2th pixel from the top in Left reference. 
)eft_reference_N[z] : Color of zth pixel from the top in down-sampled (CR=1/N) 
Left reference. 

top_reference[z] : Color of zth pixel from the left-end in Top reference. 
top_reference_N[z] : Color of zth pixel from the left-end in down-sampled (CR=1/N) 
Top reference. 

CR=1/N 




down 
sampling r 
16/N 



16/N UP 



sampling 



conversion error 



Fig.M6 Size-conversion 




o x oi o x o' o x oi o x o> o x oi o x o> o x oi o x o 
p Oip OiO_P|0_Oip OjO OiO_PjO o 

h: ^ ^ - J 0 Y O OO 

0 OIO 0 



O x 0 | 0 x 0 | 0 x 0 | 0 x 0 |°x 0 | 0 x 0 | 
I0 X qLo X 0' O^ 1 Q X ob X Qio X O l 



o Y o o Y o o Y o o x o o o o x o 

O Ol O Ol O Ol O Ol O 01 0 Oi i 



^1^ 

o_p o o 

O x O|O x O 
O o o o 



O x O|O x O|O x O|O x O|O x O|O x Ol 

o o o o o_p o_o o o o o _ 
oAol oo 1 oxfoflop* o x d o x o 
o Ojp Oj o_P| o;oip Ojp 0| qloj o b 
o o o o o o o o o o o o 
o do Ol OLPI QLOl p oLo ol 



o o o o 
0 olo 0 



°xWlWx 0 | 0 x 0 l°x 0 l 

° °lp o op'oo'o o_oo 
0 Ol O y O| O x O\ 0p\ O x O| o oi 
0 Q 0 0,0 0,0 0,0 0,0 0, 



°x 0 |°x 0 

oooo 
0 x O|OO 

0 o.o o 



MB 



21 



(a) Down sampling 
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(b) Up sampling 
Fig. M7 Down sampling / Up sampling disposition 

4. Fig.M8 shows the How chart for determining CR. Error-PB in this chart is regulated as the result of the 
following three steps. 

(1) Sum of the conversion error for each pixel is calculated every 4x4 pixels block(PB). Here, the 
conversion error means the absolute difference between the value of a pixel in an original MB and 
that in a reconverted MB. If the above-mentioned sum is larger than the predetermined threshold 
TH1, this PB becomes "Error-PB" <in this case, 4x4 block). If there is no 4x4 Error-PB in a MB, 
the next step is carried out. 

(2) Sum of the conversion error for each pixel is calculated every 8x8 PB. If the sum is larger than the 
predetermined threshold TH2, this PB becomes "Error-PB" (in this case, 8x8 block). If there is no 
8x8 Error-PB in a MB, the next step is carried out. 

(3) Sum of the conversion error for each pixel is calculated every 16x16 PB (in this case, PB is MB 
itself). If the sum is larger than the predetermined threshold TH3, this PB becomes "Error-PB" (in 
this case, 16x16 block). 

Threshold for each step is changed in accordance with CR as follows. 
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CR 


TH1 (4x4) 


TH2 (8x8) 


TH3 (16x16) 


1/2 


16x(ctTH+)6) 


64x(aTH+4) 


256xaTH 


1/4 


!6x(aTH+32) 


64x(aTH+8) 


256xaTH 



START 







SETCR=l/4 




Fig.M8 CR determination algorithm 



Modified MMR is carried out for each size-converted MB whose size is determined by the algorithm 
shown in Fig.M8. 

If the down-sampled MB becomes All white MB or All black MB, modified MMR is not carried out, 
and only the code word first_MMR_code is transmitted. 
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3.2.1.6 Rounding process for bit-rate control 



The following rounding process can be used to control the number of bits. The rounding process 
starts at a 4x4 alpha sub-block. A 4x4 alpha sub-block is rounded to all TT or all 755* when the 
sum of the rounded error is smaller than or equal to a pre-determined threshold (TH QT ) . Jf the 4x4 
alpha sub-block is not rounded, it is further subdivided into four 2x2 alpha sub-blocks. The 
rounding process is than performed to each of the 2x2 alpha sub-blocks while the sum of the 
rounded error is below than or equal to TH QT . The rounding process is given by the following 
pseudo-code. 



tf(Zx<— THq T ) 

x s 0 for all alpha of 4x4 block; 
if(Zx>= 16*255-TH0r) 

x = 255 for all alpha of 4x4 block; 
i/(4x4 block is not rounded) { 

int i, dis[4], alphaf4), sum_dis = 0; 
for(i - 0;i < 4;++i) 

if(Iy < 4*255/2) I* y is a alpha value of Mh 2x2 block */ 
{ disfi] = ly; alpha[i) = 0;J 

else 

( disfi} = 4*255 - ly; alphafi] - 255; J 

do( 

int min_dis = l«{8*sizeof{int)-2); I* maximum integer value */ 
int j; 

for(i =0;i < 4;++i) /* search not rounded 2x2 block */ 

if((min_dis > dis[i])&&(dis[i] > 0)) 
{min^dis - disfi]; j =/,/ 
if (sum_dis+min_dis <- THqt) { 

sumjtis += min_dis; 

y = alphalj} for all alpha of y-th 2x2 block; 
minjdis^dislj] = 0; 

} 

} while (min^dis == 0); 

The bit rate of alpha plane is controlled by threshold <x T h ■ The following parameters are recommended for 
possible core experiments requiring lossy shape coding. 

• 77/07-= 16* a™ 
ajn = 0(lossless), 16, 32 and 64 

an, should be smaller than 128 to prevent undesirable rounding results. 
3.2.2 Grey Scale Shape Coding 

Gray-level alpha plane is encoded as its support function and the alpha values on the support. The support 
function is encoded by binary shape coding as described in Section 3.2.1 and the alpha values are encoded 
as texture with arbitrary shape. 
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1 . The support is obtained by thresholding the gray-level alpha plane by 0. 

2. The alpha values are partitioned into 16x16 blocks and encoded the same way as the luminance (see 
Sections 3.3.4.2 and 3.4). The 16x16 blocks of alpha values are referred to as alpha macroblock 
hereafter. The encoded data of an alpha macroblock are appended to the end of the corresponding 
(texture) macroblock, or the encoded macroblock texture in the case of separate motion-texture mode. 
The formats and syntax of the encoded alpha macroblocks are described in the following. 

• 1-VOP and P-VOP 

] 



[CODA ICBPA | Alpha Block Data 



For I-VOP, CODA is I if the alpha values in the alpha macroblock are all 255 and 0 otherwise. For 
P-VOP, 

if alpha_mb_all_opaque { 

if colocated_alpha_mb_all_opaque 
CODA = 1 

else 

CODA =01 

) 

else { 

if (MV == 0 && alpha_residue_all_zero) 
. CODA = 1 
else 

CODA =00 

) 

When CODA is 1 or 01, no more data are included in the bitstream. CBPA (Coded block pattern 
for alpha) is the same as CBPY (See Appendix A.l .4). Note that for both I-VOP and P-VOP, the 
third column of Table B.5 is used. Alpha block data format are the same as block data (see 
Appendix A. 2). Note the alpha macroblocks and blocks with all zero alpha values are not included 
in the bitstream. The CBPA bit for all zero alpha blocks is set to zero. All of the rest parameters 
needed for encoding and decoding of alpha macroblocks are the same as in the texture 
macroblocks. 



B-VOP 



ICODA IMODBA | CBPBKlpha Block Data I 

CODA is 1 if the alpha values in the alpha macroblock are al! 255 and thus no more data are sent 
to the bitstream. CODA is 0 otherwise. MODB A has the same meaning as MODB (See Appendix 
B.I. 9) and CBPBA is similarly defined as CBPB (see B. 1 . 1 1 ) except that it only has four bits. 



3.3 Motion Estimation and Compensation 
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In order to perform motion prediction on a per VOP basis, the motion estimation of the blocks on the VOP 
borders has to be modified from block matching to polygon matching. Furthermore, a special padding 
technique, i.e., the repetitive padding, is required for the reference VOP. The details of these techniques are 
described in the following sections. 

Since the VOPs have arbitrary shapes rather rectangular shapes, and the shapes change from time to time, 
some conventions is necessary to ensure the consistency of the motion compensation in the VM. 

The absolute (frame) coordinate system is used for referencing all of the VOPs. At each particular time 
instance, a bounding rectangle that includes the shape of that VOP, as described in section 3.1, is defined. 
The left and top corner, in their absolute coordinates, of the bounding box is encoded in the VOP spatial 
reference. 

Thus, the motion vector for a particular feature inside a VOP, e.g. a macroblock, refers to the displacement 
of the feature in absolute coordinates. No alignment of VOP bounding boxes at different time instances is 
performed. 

In addition to the motion estimation and compensation mode, two additional modes are supported, namely, 
unrestricted and advanced modes are supported. In all three modes, the motion vector search range is up to 
(-64, 63.5]. This mode differs from the unrestricted motion mainly by restricting the motion vectors inside 
the bounding box of the VOP. The advanced mode allows multiple motion vectors in one macroblock and 
overlapped motion compensation. Note that in all three modes, padding of the VOP up to a rectangle is 
needed for both motion estimation and compensation. 

3.3.1 Image Padding Technique 

An image padding technique, repetitive padding, is applied on the reference VOP for performing motion 
estimation/compensation and on the texture coding for DCT. In this section, the procedure of repetitive 
padding is described. The details of how the repetitive padding is applied to unrestricted motion 
estimation/compensation and residual errors are described in each corresponding section. 

Repetitive padding process consists of five steps. The reconstructed alpha plane is used for repetitive 
padding. 

(1) Consider each undefined pixel outside the object boundary a zero pixel. 

(2) Scan each horizontal line of the original image region. Each scan line is possibly composed of two 
kinds of line segments: zero segments that have all zero pixels within each segment and non-zero 
segments that have all non-zero pixels within each segment. If there are no non-zero segments, do 
nothing. Otherwise, there are two situations for a particular zero segment: it can be positioned either 
between an end point of the scan line and the end point of a non-zero segment, or, between the end 
points of two different non-zero segments. In the first case, fill all of the pixels in the zero segments 
with the pixel value of the end point of the non-zero segment. In the second case, fill all of the pixels in 
the zero segments with the averaged pixel value of the two end points. 

(3) Scan each vertical line of the original image and perform the identical procedure as described in (1) to 
each vertical scan line. 

(4) If a zero pixel can be filled in by both (2) and (3), the final value takes the average of the two possible 
values. 
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(5) Consider the rest of zero pixels. For any one of them, scan horizontally to find the closest non-zero 
pixel on the same horizontal scan (if there is a tie, the non-zero pixel to the left of the current pixel is 
selected), and scan vertically to find the closest non-zero pixels on the same vertical scan (if there is a 
tie, the non-zero pixel on the top of the current pixel is selected). Replace the zero pixel by the average 
of these two horizontally and vertically closest non-zero pixels. 

The Figure 3.3.1 illustrates the outcome of each of the steps described above. 




(2) (3) (4) (5) 



Figure 33.1 Illustration of some steps of repetitive padding. 



3.3.2 Basic Motion Techniques 

3.3.2.1 Modified Block (Polygon) Matching 

The bounding rectangle of the VOP is first extended on the right-bottom side to multiples of macroblock size. So the 
size of the bounding rectangle of the luminance VOP is multiples of 16x16, while the size of the chrominance plane is 
multiples of 8x8. The alpha value of the extended pixels is set to be zero. The macroblocks are formed by dividing the 
extended bounding rectangles into 16x16 blocks. Zero stuffing is used for these extended pixels. SAD (Sum of 
Absolute Difference) is used as error measure. The original alpha plane for the VOP is used to exclude the pixels of 
the macroblock that are outside the VOP. SAD is computed only for the pixels with nonzero alpha value. This forms 
a polygon for the macro block that includes the VOP boundary. Figure 3.3.2 illustrates an example. 




Figure 3.3.2 Polygon matching for an arbitrary shape VOP. 



The reference VOP needs to be padded strictly based on its own shape information. For example, when the 
reference VOP is smaller than the current VOP, the reference is not padded up to the size of the current 
VOP. 
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In case the 8x8 advanced prediction mode is chosen and all of the pixels in an 8x8 block are transparent 
(completely outside the VOP), the SAD is set to be zero and the motion vector for this 8x8 block is zero. 
No matching needs to be done for this 8x8 block. 

3.3.2.2 Integer pixel motion estimation 

Both 8x8 and 16x16 vectors are obtained from the search algorithm. Only a small amount of additional computation is 
needed to obtain the 8x8 integer vectors in addition to the 1 6x J 6 vectors. 

The search is made with integer pixel displacement and for the Y component: The comparisons are made between the 
incoming block and the displaced block in the previous original VOP. A full search around the original macroblock 
position is used with a maximum search area depending on the range provided by the f_code. 

N.ff 

SAD N U y) - £ Original - previou^ * (! {Alpha . inal = 0)), 
Jr, y="up to [-64, 63]", N = 16or8 

For the zero vector SAD, 6 (0,0) is reduced to favor the zero vector when there is no significant difference. 
SAD ]6 {0 i 0) = SAD ]6 (0 y 0)-(N B /2 + ]) 

where N B = number of macroblock pixels inside VOP.The (x,y) pair resulting in the lowest SAD, 6 is chosen as the 
16x 16 integer pixel motion vector, V0. The corresponding SAD is SAD 16 (x,y). 

Likewise, the (x,y) pairs resulting in the lowest SAD 8 (x f y) are chosen to give the 4 8x8 vectors VI, V2, V3 and V4. 
The 8x8 based SAD for the macroblock is 

K 

i 

where 0<K<=4 is the number of 8x8 blocks that do not lie outside of the VOP shape. The following rule , and 
SAD mer = mm(SAD i6 (x,y\SAD^) 

Instead of full search, the 8x8 search is centered around 16x 1 6 vector for the following reasons : 

(i) it is faster, 

(ii) it generally gives better results because.of a better OBMC filtering effect and less bits spent 
for vectors, 

(iii) if the Extended MV Range is used, the search range around the motion vector predictor will be 
less limited. 



3.3.2.3 INTRA/INTER mode decision 

After the integer pixel motion estimation the coder makes a decision on whether to use INTRA or INTER prediction in 
the coding. The following parameters are calculated to make the INTRA/INTER decision: 
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MB_mean = ( ^original) / N B 



i-l.j-l 



16.16 

A = j|originaI - MB_mean|*( K Alpha ^ = 0)) 

i=l,j=l 

INTRA mode is chosen if: A < (SAD imer — 2*N B ) 

Notice that if SAD J6 (0,0) is used, (his is ihe value (hat is already reduced as explained above. 

If INTRA mode is chosen, no further operations arc necessary for the motion search.. If INTER mode is chosen the 
motion search continues with half sample search around the V0 position. 

3.3.2.4 Half sample search 

Half sample search is performed for i6x 16 vectors as well as for 8x8 vectors. The half sample search is done using the 
previous reconstructed VOP. The search is performed on the luminance component of the macroblock, and the search 
area is ±1 half sample around the target matrix pointed to by V0, V], V2, V3 or V4. For the 16x16 search the zero 
vector sad, SAD(0,0), is reduced by NB/2+1 as for the integer search. 

The half sample values are found using the interpolation described in Figure 3.3.3 and which corresponds to bilinear 
interpolation. 

'a (+) b O ^ + + I nte fi er P» x e' position 



q ^ q O Half pixel position 



a = A, b = (A + B)//2 

c = (A + C)//2, d = (A + B + C + DV/4 

"ir denotes rounded division. 

Figure 3.3.3 Interpolation scheme for half sample search. 



The vector resulting in the best match during the half sample search is named MV. MV consists of horizontal and 
vertical components (MVx, MVy), both measured in half sample units. 

3.3.2.5 Decision on 16x1 6 or 8x8 prediction mode 

SAD for the best half sample 16x 16 vector (including subtraction of NB/2+1 if the vector is (0,0)): 
SAD l(t (x t y) 

SAD for the whole macro block for the best half sample 8x8 vectors: 
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SAD^=^SAD,Uy) 
i 

where 0<K<~4 is the number of 8x8 blocks that do not lie outside of the VOP shape. The following rule 
applies: 

If: SADm < SAD X(j - (N B 1 2 + 1 ) , choose 8x8 prediction 
otherwise: choose 16x16 prediction 

3.3.2.6 The motion vector range 
• To be reedited. 

3.3.2.7 Differential coding of motion vectors 

When using INTER mode coding, the motion vector must be transmitted. The motion vector components (horizontal 
and vertical) are coded differentially by using a spatial neighborhood of three motion vectors already transmitted 
(Figure 3.3.4). These three motion vectors are candidate predictors for the differential coding. 

In the special cases at the borders of the current VOP the following decision rules are applied: 

1 . If the macroblock of one and only one candidate predictor is outside of the VOP, it is set to zero. 

2. If the macroblocks of two and only two candidate predictors are outside of the VOP, they are set to the 
third candidate predictor. 

3. If the macroblocks of all three candidate predictors are outside of the VOP, they are set to zero. 

The motion vector coding is performed separately on the horizontal and vertical components. 

For each component, the median value of the three candidates for the same component is computed: 

Px = Median(MV]x, MV2x, MV3x) . 
Py = Median(MVly, MVly, MV3y) 

For instance, if MVl=(-2,3), MV2=(],5) and MV3=(-],7), then Px = -1 and Py= 5. 

The Variable Length Codes for the vector differences MVDx and MVDy are listed in Table B.7. 

MVDx= MVx-Px 
MVDy - MVy - Py 

. For prediction of 8x8 vectors see section 3.3.4. 
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MV : Current motion vector 
MVI : Previous motion vector 
MV2: Above motion vector 
MV3: Above right motion vector 
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Figure 3.3.4 Motion vector prediction. 



3.3.3 Unrestricted Motion Estimation/Compensation 
3.3.3.1 Motion vectors over VOP boundaries 

An unrestricted motion estimation mode is used for VOP motion estimation and compensation. The 
technique is to improve the motion estimation techniques, especially for VOP-based coding schemes. In 
this technique, the error signal is generated by extending the reference VOP to enough size, padding the 
extended VOP, applying motion estimation and compensation, and taking the difference of the. original and 
the estimated signals. Note that padding is performed only on the reference VOP. Target VOP remains the 
same except for extend^ — : * »•- 
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Repetitive padding on all transparent (alpha — 0) pixels. 
Figure 3.3.5 Illustration of extending reference VOP. 

• Pad the extended regions using repetitive padding. Use the padded VOP as the new reference VOP. 

• Apply modified block (polygon) matching described in Section 3.3.2. 1 to compute the motion vectors. 
3.3.4 Advanced prediction mode 

3.3.4.1 Formation of the motion vectors , 

One/four vectors decision is indicated by the MCBPC codeword for each macroblock. If only one motion 
vector is transmitted for a certain macroblock, this is defined as four vectors with the same value. If 
MCBPC indicates that four motion vectors are transmitted for the current macroblock, the information for 
the first motion vector is transmitted as the codeword MVD and the information for the three additional 
motion vectors is transmitted as the codewords MVD24. 

The vectors are obtained by adding predictors to the vector differences indicated by MVD and MVD 2 ^ in a 
similar way as when only one motion vector per macroblock is present, according to the decision rules 
given in section 3.3. Again the predictors are calculated separately for the horizontal and vertical 
components. However, the candidate predictors MV1, MV2 and MV3 are redefined as indicated in Figure 
3.3.6. If only one vector per macroblock is present, MV1, MV2 and MV3 are defined as for the 8*8 block 
numbered 1 in Figure B.3. 
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Figure 33.6 Redefinition of the candidate predictors M VI , MV2 and MV3 
for each of the luminance blocks in a macroblock 

If four vectors are used, each of the motion vectors is used for all pixels in one of the four luminance blocks 
in the macroblock. The numbering of the motion vectors is equivalent to the numbering of the four 
luminance blocks as given in Figure B.3. Motion vector MVD CHR for both chrominance blocks is derived 
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by calculating the sum of the A" luminance vectors, that corresponds to K 8x8 blocks that do not lie outside 
the VOP shape and dividing this sum by 2*)\ the component values of the resulting 
sixteenth/twelfth/eighth/fourth sample resolution vectors are modified towards the nearest half sample 
position as indicated in Table 3.3. 1 .a/b/c/d. 



Table 3J.l.a 



Modification of sixteenth sample resolution chrominance vector components 
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Table 33.1.b 

Modification of twelfth sample resolution chrominance vector components 










twelfth pixel position 


0 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


/12 








resulting position 


0 


0 


1 


1 


1 


1 


1 


1 


1 


1 


2 


2 


n 








Table 33.1 ;c 

Modification of eighth sample resolution chrominance vector components 
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Table 3J.l.d 

Modification of fourth sample resolution chrominance vector components 



fourth pixel position 
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Half sample values are found using bilinear interpolation as described in section 6.1.2. The prediction for 
luminance is obtained by overlapped motion compensation as described above. The prediction for 
chrominance is obtained by applying the motion vector MVDchr to all pixels in the two chrominance 
blocks (as it is done in the default prediction mode). 

The predictor for MVD and MVD^ is defined as the median value of the vector components MV1, MV2 
and MV3 as defined in section 3.3.2.7. 

3.3.4.2 Overlapped motion compensation for luminance 

Each pixel in an 8*8 luminance prediction block is a weighted sum of three prediction values, divided by 8 
(with rounding). In order to obtain the three prediction values, three motion vectors are used: the motion 
vector of the current luminance block, and two out of four "remote" vectors: 

• the motion vector of the block at the left or right side of the current luminance block; 

• the motion vector of the block above or below the current luminance block. 
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For each pixel, the remote motion vectors of the blocks at the two nearest block borders are used. This 
means that for the upper half of the block the motion vector corresponding to the block above the current 
block is used, while for the lower half of the block the motion vector corresponding to the block below the 
current block is used (see Figure 3.3.8). Similarly, for the left half of the block the motion vector 
corresponding to the block at the left side of the current block is used, while for the right half of the block 
the motion vector corresponding to the block at the right side of the current block is used (see Figure 3.3.9). 

The creation of each pixel, p{ij), in an 8*8 luminance prediction block is governed by the following 
equation: 

P(i> j) = (q(i, j) x H 0 (i, j) + r(i J) x H , (i, j) + s(i, j) x H 2 (i J) + 4) / /8, 

where q(ij\ r(ij), and s(ij) are the pixels from the referenced picture as defined by 

4(i,j) = p(i + MV g 0 J + MV!), 
r(iJ) = p(i + MV*J + MV*), 
s(iJ)=p(i + MV x 2 J + MV?). 

Here, (MV?,MV°) denotes the motion vector for the current block, ( MV l x , MV x y ) denotes the motion 

vector of the block either above or below, and {MV X 2 , MV*) denotes the motion vector either to the left 
or right of the current block as defined above. 

The matrices H 0 (/,;),#,(/,;) and H 2 (iJ) are defined in Figure 3.3.7, Figure 3.3.8, and Figure 3.3.9, 
where denotes the column and row, respectively, of the matrix. 

If one of the surrounding blocks was not coded, the corresponding remote motion vector is set to zero. If 
one of the surrounding blocks was coded in INTRA mode, the corresponding remote motion vector is 
replaced by the motion vector for the current block. If the current block is at the border of the VOP and 
therefore a surrounding block is not present, the corresponding remote motion vector is replaced by the 
current motion vector. Jn addition, if the current block is at the bottom of the macroblock, the remote 
motion vector corresponding with an 8*8 luminance block in the macroblock below the current macroblock 
is replaced by the motion vector for the current block. 

The weighting values for the prediction are given in Figure 3.3.7, Figure 3.3.8, and Figure 3.3.9. 
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Figure 33.7 Weighting values, H 0i for prediction with motion vector of current luminance block 
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Figure 3 3& Weighting values, H, t for prediction with motion vectors of the luminance blocks 
on top or bottom of current luminance block 
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Figure 3.3.9 Weighting values, H 2 , for prediction with motion vectors of the luminance blocks 
to the left or right of current luminance block 

3.4 Texture Coding 

The intra VOPs and the residual. data after motion compensation is coded using the same 8x8 block DCT 
scheme. DCT is done separately for each of the luminance and chrominance planes. When shape of the 
.VOP is arbitrary, the macroblocks that belong to the arbitrary shape of the VOP are treated as described 
below. There are two types of macroblocks that belong to an arbitrarily shaped VOP: 1) those thai lie 
completely inside the VOP shape and 2) those that lie on the boundary of the shape. The macroblocks that 
lie completely inside the VOP are coded using a technique identical to the technique used in H263. The 
intra 8x8 blocks that belong to the macroblocks lying on the border of the VOP shape are first padded as 
described in the motion estimation/compensation section. For padding of chroma blocks, a 16x16 alpha 
block is decimated by discarding every other row and column of pixels, starting from the second row and 
second column. For residue blocks, the region outside the VOP within the blocks are padded with zero. 
Padding is performed separately for each of the luminance and chrominance 8*8 blocks by using the 
reconstructed alpha values of the luminance or chrominance in this 8*8 block. If all the pixels in an 8x8 
block are transparent, their values are replaced by zero. These blocks are then coded in a manner identical 
to the interior blocks. The macroblocks that do not belong 10 the arbitrary shape but inside the bounding 
box of a VOP are not coded at all.. " . 
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3.4.1 DCT 

A separable 2-dimensional Discrete Cosine Transform (DCT) is used. 

3.4.2 H.263 Quantization Method 

The quantization parameter QP may take integer values from 1 to 31, The quantization stepsize is 2xQR 
COF. A transform coefficient to be quantized, 

LEVEL Absolute value of the quantized version of the transform coefficient 
COF' Reconstructed transform coefficient. 

Quantization: 

For INTRA: v LEVEL = \COF] I (2 x QP) 

For INTER: LEVEL = {\COF\-QP 1 2) I (2xQP) 

Dequantization: 

|COF'| = 0, if LEVEL = 0 

\COF'\ = 2xQPx LEVEL + QP, if LEVEL * 0,QPis odd 

\COF'\ = 2xQPxLEVEL + QP~], if LEVEL *0,QP\$ even 

The sign of COF is then added to obtain COF': COF' = Sign(COF)xl COF1 

The DC coefficient of an INTRA block is quantized as described below. 8 bits arc used for the quantized 
DC coefficient. 

Quantization: 

LEVEL = COF 1 1 8 

Dequantization: 

COF' = LEVELxS 

3.4.3 MPEG Quantization Method 
3.4.3.1 Quantization of Intra Macroblocks 

DC Coefficient 

The quantizer step-size for the DC coefficient of the luminance and chrominance components is 8. Thus the 
quantized DC value, QDC, is calculated as: 

QDCs=dc//8 

where "dc" is the 1 1 -bit unquantized value from the DCT. 
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AG Coefficients 

AC coefficients ac[i][j] are first quantised by individual quantization factors, 
ac-[i][i] = (16*ac[i)U))//w 1 {i]U) 

where wj[i][j] is the [i](j]th element of the default Intra quantizer matrix, which for this VM carries a value 
of 16 except for DC coefficient location which carries a value 8; this is referred to as flat matrix. 
The resulting ac-[i][j] is limited to the range [-2048, 2047]. 

An example of non-flat intra quantization matrix which can be alternatively used as default is provided for 
guidance in Figure 3.4.3.1 and if used should be clearly stated in core experiment comparisons against VM. 
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Figure 3.4 J. 1 - Example Intra quantizer matrix 

The step-size for quantizing the scaled OCT coefficients, ac~[i][j], is derived from the quantization 
parameter, QP. 

The quantized level QAC[i][j] is given by; 

QACEHU] = (ac-tilUJ + sign(ac-tiJU])*((p * QP) // q)) / (2*QP) 

where, QAC [i]|j] is limited to the range [-127.. 127], 
For this VM p=3, and q = 4. 



3.4.3.2 Quantization of Non intra Macroblocks 

Forward Quantization: 

Non-intra macroblocks in P- and B- VOPs are quantized with a uniform quantizer that has a dead-zone 
about zero. The default quantization matrix carries a value of 16 for each entry and is referred to as flat 
• matrix. 

. An example of non-flat nonintra quantization matrix which can be alternatively used as default is provided 
for guidance in Figure and if used should be clearly stated in core experiment comparisons against VM. 
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Figure 3.4 32 - Non-intra quantizer matrix 

The step-size for qunntizing both the scaled DC and AC coefficients is derived from the quantization 
parameter, QP. 

ac~[i][j]=(16*ac[i]U])//w N [ij[j] 

where: 

wjsj[i]|J] is the non-intra quantizer matrix 
QACfi][j] = ac4i][j]/(2*QP) 
QAC (iJOJ is limited to the range [-128.. 127 J. 

3.4.3.3 Inverse Quantization of Intra and Non Intra Macroblocks 

QF[i][j] are levels decoded from the bitstream for a block. The following equations describe the process of 
inverse quantization. Two main cases are identified, one when macroblock being decoded is intra and the 
. other when it is nonintra. 

for (v=0; v<8;v++) { 
for (u=0; u<8;u++) { 

if { ( W =0) && (v=0) && (MB intra) ) { 

F'\v)[u] = 8*QF[Q}m 
} else { 

if (MB intra) { 

F'\v][u) * ( QF[v][u] * wj[v][u]* QP * 2 ) / 16; 
} else { 

F'\ v][u] = ( ( ( QF[v][u] * 2 ) + Sign(QF[y)[u]) ) * w N [v][u] 

*QP)l 16; 

} 

I 

) 

) 

3.4.3.4 DC Prediction of DC coefficients in Intra Macroblocks 

After the DC coefficient of a block has been quantized to 8 bits, it is coded lossless by a DPCM technique. 
Coding of the luminance blocks within a macroblock follows the normal scan of Figure B.3. Thus the DC 
value of block 4 becomes the DC predictor for block 1 of the following macroblock, resulting in the zig-zag 
scan order shown in Figure 3.4.3.3. The following details the process for DC prediction of DC coefficients 
in intra macroblocks: 

1 . Separate predictors are used for Y, U and V. 

2. Predictors are initialised to 1 28 before the first macroblock at the left edge of the VOP macroblock row. 

3. For U/V, the predictor for the current macroblock is given by the U/V DC value in the macroblock to the 
left. This predictor is set to 128 if the macroblock to the left is a P macroblock or is fully outside the VOP 
boundary. 

4. For each Y macroblock, the DC predictor is given by the "last used" DC value in the macoblock to the 
left This predictor is set to 128 if the macroblock to the left is a P macroblock or contains no Y blocks at 
air (fully outside VOP boundary). The "last used" DC value is defined lo be the DC value of the block with 
highest block number which is not fully outside the VOP. 
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5. For Y, intra DC prediction proceeds in block order. 1 ,2,3,4 (a zig-zag). The first block which is not fully 
outside the'VOP is predicted using the predictor from step 4. Each subsequent block that is present is 
predicted from the previous block which is present. Note that DC predictors are hot reset to 128 within a 
macroblock because any other block will generally provide a better predictor than 128. 




DC predictor for 
next MB 



MB 1 I MB 2 I MB 3 



Figure 3.433. Zigzag scanning of macroblochs for DC prediction. 



At the decoder, the original quantized DC values are exactly recovered by following the inverse procedure. 

The differential DC values thus generated are categorised according to their "size" as shown in the Tables 
Tl and T2 



Table Tl — Variable length codes for DC size luminance 



vie code 


DC size Luminance 


100 


0 


00 


1 


01 


2 


101 


3 


110 


4 


1110 


5 


11110 


6 


111! 10 


7 


Nil 110 


8 
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Table T2 ■• Variable length codes for DC size chrominance 



Vic code 


DC size chrominance 


UU 


0 


A1 


1 


1 A 
10 


2 


1 10 


■5 
J 


1110 


4 


111) 0 


5 


mi 10 


6 


mi no 


7 


1111 1110 


8 



For each category additional bits are appended to the SIZE code to uniquely identify which difference in 
that category actually occurred (table 8.3). The additional bits thus define the signed amplitude of the 
difference data. The number of additional bits (sign included) is equal to the SIZE value. 

Table T3. Differential DC additional codes 



DIFFERENTIAL DC 


I SIZE 


| ADDITIONAL CODE 


-255 to -128 


8 


oooooooo to oi ii mi 


-127 to -64 . 


7 


ooooooo to oi n i n 


-63 to -32 


6 


000000 to 01 111 I 


-31 to -16 


5 


00000 to 01 III 


-15 to -8 


4 


0000 to 0111 


-7 to -4 


3 


ooo to on 


3 to -2 


2 


00 to 01 


-1 


1 


0 


0 


0 




1 


1 


1 


2 to 3 


2 


10 to 11 


4to7 


3 


100 to 111 


8 to 15 


4 


looo to nn 


16 to 31 


5 


10000 to 11111 


32 to 63 


6 


100000 to 11 11 1 J 


64 to 127 


7 


lOOOOOOto 11111 11 


128 to 255 


8 


10000000 to 111 III 11 



3,4.4 VLC encoding of quantized transform coefficients 

The 8x8 blocks of transform coefficients are scanned with "zigzag" scanning as listed in Figure 3.4.4. i . 
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21.23 34 39 47 52 56 61 
22 35 38 48 51 57 60 62 
36 37 49 50 58 59 63 64 



Figure 3.4.4.1. Zigzag scanning pattern. 

A three dimensional variable length code is used to code transform coefficients. An EVENT is a combination of three 
parameters: 

LAST 0: There are more nonzero coefficients in the block. 

I : This is the last nonzero coefficient in the block. 

RUN . Number of zero coefficients preceding the current nonzero coefficient. 

LEVEL Magnitude of the coefficient 

The most commonly occurring combinations of (LAST, RUN, LEVEL) are coded with variable length codes given in 
Appendix B. The remaining combinations of (LAST, RUN, LEVEL) are coded with a 22 bit word consisting of : 

ESCAPE 7 bit 

LAST 1 bit (0: Not last coefficient, 1 : Last nonzero coefficient) 

RUN 6bit 
LEVEL 8 bit 

The code words for these fixed length ESCAPE codes are described in Appendix B. 



3.5. Prediction and Coding of B-VOPs 

Macroblocks in B-VOPs can be coded either using R263 like B-block coding or by MPEG-1 like B-picture 
macroblock coding. The main difference is. in the amount of motion vector and quantization related 
overhead needed. The MBTYPE with H.263 like B-block coding is referred to as direct prediction, besides 
which, the forward, the backward and the interpolated prediction modes of MPEG- 1 B -pictures are 
supported. Thre syntax and semantics for macroblock and block layer for B-VOPs are presented in 
Appendix B.The encoding issues for B-VOPs are discussed next. 

3.5.1. Direct Coding 

This coding mode uses direct bidirectional motion compensation derived by extending H.263 approach of 
employing P -picture macroblock motion vectors and scaling them to derive forward and backward motion 
vectors for macroblocks in B-picture. This is the only mode which makes it possible to use motion vectors 
on 8x8 blocks, of course, this is only possible when the co-located macroblock in the following P-VOP uses 
8x8 MV mode. As per H.263, using B-frame syntax, only one delta motion vector is allowed per 
macroblock. Figure 3.5. 1 shows scaling of motion vectors. 

The first extension of the H.263 approach is that bidirectional predictions can be made for a full 
block/macroblock as in MPEG- 1. The second extension of H.263 is that instead of allowing interpolation of 
only one intervening VOP, more than one VOPs can be interpolated. Of course, if the prediction is poor due 
to fast motion or large interframe distance, other motion compensation modes can be chosen. 
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MV f = MV/3 + MV D 




V _MV B = (2MV/3) If MV D Is, zer o 

MV 8 ,= ((MV/3+MVD)-MV) H MV 0 is ncnzerb 
Ncte: MV D is the delta vector dventy MVD 



Figure 3^.1 Direct Bidirectional Prediction 

.3:5.1.1. Calculation of vectors 

The calculation of forward and backward motion vecors involves linear scaling of the collocated block in 
temporally next P-VOP, followed by correction by a delta vector, and is thus practically identical to the 
procedure followed in H.263. The only slight change is that here we are dealing with VOPs instead of 
pictures, and instead of only a single B-picture between a pair of reference pictures, multiple B-VOPs are 
allowed between a pair of reference VGPs. As in H.263, the temporal reference of the B-VOP relative to 
difference in the temporal reference of the pair of reference VOPs is used to determine scale factors for 
computing motion vectors which are corrected by the delta' vector. 

The forward and the backward motion vectors are MV F and MV. arid are given in half sample units as 
follows.. . . 



MV F=S (TR B x MV) / TR D +MV b 
MV. = ((TR, - TRo) x MV) / TRo 
MV.-MV F -MV 



if MV D is equal toO 

if MV D is not equal to 0 



Where MV is the direct motion vector of a macroblock in P-VOP with respect to a reference VOP, TR, is 
the difference in temporal reference of the B-VOP and the previous reference VOP..TR D is the difference in 
temporal reference of the temporaJly next reference VOP with temporally previous reference VOP 
assuming B-VOPs or skipped VOPs in between. ' 



3.5.1.2. Generating Prediction Block 

The process of generating a prediction block is fairly trivial and simply consists of using computed forward 
and backward motion vectors to obtain appropriate blocks from reference VOPs and averaging these 
blocks. Irrespective of whether the direct prediction motion vectors are derived by scaling of a single 
motion vector or four 8x8 motion vectors per block, motion compensation is performed individually on 8x8 
blocks to generate a macroblock.. In case for a macroblock only a single motion vector was available to 
compute direct prediction motion vector, it is simply repeated for each of the 8x8 blocks forming the 
macroblock. The main difference with H.263 is that there are no constraints in the amount of region within 
a block that can be bidirectionally predicted; each entire macroblock can be bidirectionally predicted. 

The direct coding mode does not allow quantizer change and thus the quantizer value for previous coded 
macroblock is used. 

3.5.2. Forward Coding 

Forward coding mode uses forward motion compensation in the same manner as in MPEG-1/.2 with the 
difference that a VOP is used for prediction instead of a picture. Only one motion vector in half sample 
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units is employed for a 16x16 macroblock being coded. Chrominance vectors are derived by scaling of 
luminance vectors as in MPEG- 1/2. 

This coding mode also allows switching of quantizer from the one previously in use. Specification of 
DQUANT, a differential quantizer involves a 2-bit overhead as discussed earlier. 

3. S3. Backward Coding 

Backward coding mode uses backward motion compensation in the same manner as in MPEG-1/.2 with the 
difference that a VOP is used for prediction instead of a picture. Only one motion vector in half sample 
units is employed for a 16x16 macroblock being coded. Chrominance vectors are derived by scaling of 
luminance vectors as in MPEG- 1/2. 

This coding mode also allows switching of quantizer from the one previously in use. Specification of 
DQUANT, a differential quantizer involves a 2-bit overhead as discussed earlier. 

3.5.4. Bidirectional Coding 

Bidirectional coding mode uses interpolated motion compensation in the same manner as in MPEG-IA2 
with the difference that a VOP is used for prediction instead of a picture. Two motion vectors in half 
sample units are employed for a 16x16 macroblock being coded. Chrominance vectors are derived by 
scaling of luminance vectors as in MPEG-1/2. 

this, coding mode also allows switching of quantizer from the one previously in use. Specification of 
DQUANT, a differential quantizer involves a 2-bit overhead as discussed earlier 

3.5.5. Mode Decisions 

Since, in B-VOPs, a macroblock can be coded in one of the four modes, we have to decide which mode is 
the best At the encoder, motion compensated prediction is calculated by each of the four modes. Next, 
using each of the motion compensated prediction macroblocks SAD (sum of absolute differences) is 
computed between it and the macroblock to be coded. The MBTYPE mode is selected as follows 

if (SAD^ -NB/2+l*<= minlSAD^, SAD m , SAD^}) 
direct mode 

else if (SAD,^ <- min{SAD^, SAD^^, SAD,^}) 

interpolate mode 
else if (SAD„ <= min{SAD^, SAD^, SAD^}) 

backward mode 

else 

forward mode 

3.5.6. Motion Vector Coding 

Motion vectors are to be coded differentially. The differential motion vector coding method is same as that 
in MPEG-1/2. All predictions are reset at the left edge of a VOP. Depending on the macroblock type either 
one or both predictors may be updated, the predictors that are not updated are carried through. For 
macroblcks coded in direct bidirectional prediction mode, the forward and backward motion vector 
computed for block prediction are to be used as forward and backward motion vector predictors. 
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3.6 Rate Control 

Rate control is the one used in the anchors for the November 1996 test (DOC M0322) on each VOP 
independently. 



3.7 Generalized Scalable Encoding 

Realizing that many applications require video to be simultaneously available for decoding at a variety of 
resolutions or qualities this VM supports scalability. In general, scalability of video means the ability to . 
achieve video of more than one resolution and/or quality simultaneously. Scalable video coding. involves 
generating a coded representation (bitstream) in a manner which facilitates the derivation of video of more 
than one resolution and/or quality by scalable decoding. Bitstream Scalability is the property of a bitstream 
that allows decoding of appropriate subsets of a bitstream to generate complete pictures of resolution and/or 
quality commensurate with the proportion of the bitstream decoded. If a bitstream is truly scalable, decoders 
of different complexities, from low performance decoders to high performance decoders can coexist, and 
while low performance decoders may decode only small portions of the bitstream producing basic quality, 
high performance decoders may decode much more and produce significantly higher quality. 

Two main types of scalability are: the spatial scalability, and the temporal scalability. The spatial scalability 
offers scalability of the spatial resolution, and the temporal scalability offers scalability of the temporal 
resolution.. Each type of scalability involves more than one layers. In the case of two layers consisting of a 
lower layer and a higher layer, the lower layer is referred to as the base-layer and the higher layer is called 
the enhancement-layer. Traditionally, these scalabilities are applied to frames of video such that in case of 
spatial scalability, the enhancement-layer frames enhances the spatial resolution of base-layer frames, while 
in temporal scalability, the enhancement-layer frames are temporally multiplexed with the base-layer 
frames to provide a higher temporal resolution video. Many MPEG-4 applications are however even more 
demanding and necessitate not only traditional frame based scalabilities but also scalabilities of VOPs of 
arbitrary shapes. 

The scalability framework discussed in this VM is referred to as generalized scalability and includes the 
spatial and the temporal scalabilities. In the case of temporal scalability, this VM supports both frames 
(rectangular VOPs) as well as arbitrary shaped VOPs, however, in the case of spatial scalability, only 
rectangular VOPs are presently supported. Figure 3.7.1 shows a high level codec structure for generalized 
scalability. 
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Figure 3.7.1 A High Level Codec Structure for Generalized Scalability. 

Video VOPs (rectangular or otherwise) are input to Scalability Preprocessor and if spatial scalablity is to be 
performed with base layer at lower spatial resolution and the enhancement layer at higher spatial resolution, 
this preprocessor performs spatial downsampling of input VOPs to generate in_0 which forms the input to 
MPEG-4 Base. Layer Encoder which performs nonscalable encoding. The reconstructed VOPs from base 
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layer ore then fed to Midprocessorl which in this case performs spatial upsampling. The other output of 
Preprocessor corresponds to the higher spatial layer VOPs and forms! the input (in_l) to the MPEG-4 
Enhancement Layer Encoder. The base- and enhancement-layer bistreams are multiplexed by MSDL Mux 
and either stored or transmitted and by employing an MSDL Demux can be retrieved for decoding by 
corresponding MPEG-4 Base Layer Decoder and MPEG-4 Enhancement Layer Decoder. The operation of 
Midprocessorl is identical to that at the encoder. The Scalability Postprocessor performs any necessary 
operations such as spatial upsampling of the decoded base layer for display resulting at 6utp_0 while the 
; enhancement layer without upsampling may be output as outp_l . 

■ When the generalized codec is used to perform temporal scalability, the Scalability Preprocessor performs 
temporal demultiplexing of a VO into two substreams of VOPs, one of which (inj)) is input to the MPEG-4 
Base Layer Encoder and the other (in_2) is input to the MPEG-4 Enhancement Layer Encoder. In this case, 
Midprocessorl does not peform any spatial resolution conversion and simply allows the decoded base-layer 
VOPs to pass through and these VOPs are used for temporal prediction in encoding of enhancement-layer. 
The operation of MSDL Mux and MSDL Demux is exactly similar as in case of spatial scalability. The 
decoding of base and enhancement-layer bi (streams occurs in the corresponding base- and enhancement- 
layer decoders as shown. The Postprocessor simply outputs the base layer VOPs without any conversion, 
but temporally multiplexes the base and" enhancement layer VOPs to produce higher temporal resolution 
enhancement layer. 

As mentioned earlier, since VOPs can have a rectangular shape (frame) or an irregular shape, both the 
traditional spatial and temporal liabilities as well as object based spatial and temporal scalabilities 
become possible. In this current version of the VM, spatial scalability is limited to rectangular VOPs. We 
how describe the encoding process for the spatial and temporal scalabilities. 

5. 7.1 Spatial Scalability Encoding 

3.7. 1.1. Base Layer and Enhancement Layers 

As mentione earlier, in spatial scalability, the base layer and the enhancement layer can have different 
spatial resolution. In this VM, the base layer has lower resolution and the enhancement layer has higher 
resolution. For example, .in simulations, the base-layer uses QCIF resolution and the enhancement-layer 
uses CIF resolution. 

3.7.1.2. Downsampling . 

The downsampling process is performed at the scalability preprocessor. For example, the downsampling 
process from 1TU-R 601 to CIF/QCIF is described in section 2.2.2 Filtering process. The downsampling 
. process for the factor of 2 is only described in this document, however downsampling for arbitary factor is 
allowed. 

3.7.1.3. Encoding of Base Layer 

' The encoding process of the base layer is the same as non_scalable encoding process. 

3.7.1.4. Upsampling Process 

The upsampling process is performed at the midprocessor. The VOP of the base layer is locally decoded . 
and the decoded VOP is upsampled to the same resolution as that of the enhancement layer. In case of the 
example . above, upsampling is performed by the filtering process described in figure 3.7.1.1 and table 
3.7:1.1. 



45 



QC1F 




O 

o 



Figure 3.7.1.1 



Factor 


Tap 


Filter taps 


Divisor 




no. 






2 


J 


1.3 


A 




2 




A 



Table 3.7.1.1 



3.7.1.5. Encoding of Enhancement Layer 

The VOP in the enhancement layer is encoded as either P-VOP or B-VOP. The relationship between VOP 
in the base layer and that of the enhancement layer is illustrated in figure 3.7.1.2.. The VOP which is 
temporally coincident with I-VOP in the base layer is encoded as P-VOP. The VOP which is temporally 
coincident with P-VOP in the base layer is encoded as B-VOP. In case of the spatial scalability, a decoded 
VOP in the base layer is used as a reference of the prediction. The temporally coincident VOP in the 
reference layer (base layer) must be coded before the encoding of the VOP in the enhancement layer. 



ta — •(?] — 
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Figure 3.7.1.2 

3.7.1 ;6. Encoding of P-VOPs of Enhancement Layer 

In P-VOP, the ref_select_code is set to "11", i.e., the prediction reference is set to I-VOP which is 
temporally coincident VOP in the base layer. 

In P-VOP, the motion vector is always set to 0, thus motion vector is not encoded to reduce the overhead. 
Mode decisions in P-VOP 

• In case of P-VOP, the encoder makes a decision on whether to use INTRA or INTER prediction in the 
coding. The method to decide the mode is the same as 1NTR A/INTER mode decision in the coding of the 
base layer. (See sec 33.2.3) If INTER mode is chosen, the macroblock is coded using the prediction from 
the VOP in the base layer. The INTER4V mode is not used in the coding of the enhancement layer. 

3.7.1 .7. Encoding of B-VOPs of Enhancement Layer 

In B_VOP, the reLselect_code is set to "00", i.e., the forward prediction reference is set to P-VOP which is 
temporally coincident VOP in the base layer, and the backward prediction reference is set to P-VOP or B- 
VOP which is the most recent decoded VOP of the enhancement layer. 

In B-VOP, when the forward prediction is selected, i.e., the prediction from the base layer is selected, the 
motion vector is always set to 0, thus thus motion vector is not encoded to reduce the overhead. 

Mode decision for B-VOP 
In case of the spatial scalability, the Direct (H.263 B) mode is not used. A macroblock in B-VOPs is coded 
in one of the other three modes. The encoder makes a decision on which mode is the best SAD (sum of 
absolute differences) is calculated for each of the three modes. The MBTYPE mode is selected as follows, . 

if (SAD iMefpc ^ e <= rniniSADtoBpou*. SAD^,^, SAD rww «)} ) 

interpolate mode 
else if (SAD,**^*^ minlSADi**^, SAD,,**^, SAD rwww> } ) 

backward mode 

else 

forward mode 



3.7.2 Temporal Scalability Encoding 

In Object-based Temporal scalability (OTS), the frame rate of a selected object is enhanced such that it has 
a smoother motion than the remaining area. In other words, the frame rate of the selected object is higher 
than that of the remaining area. There are two types of enhancement structures in OTS. 

Figure 3.7.2(a) shows the example of Type I where VOL0 (VideoObjectLayer 0) is an entire frame with 
both an object and a background, while VOL1 represents the particular object in VOL0. VOL0 is coded 
with a low frame rate and VOL) is coded to achieve a higher frame rate than VOL0. In this example, frames 
2 and 4 are formed by combining two base layer frames 0 and 6 followed by overlapping the object of the 
enhancement layer onto the combined frame. The combined frame is formed using a process we call 
"background composition", as described in Section 5.3. In this example, forward predictions forming P- 
VOPs are used. Figure 3.7.2(b) shows another example of Type 1 that also uses bidirectional predictions 
forming B-VOPs in the enhancement layer. 

Figure 3.7.3 shows the example of Type 2 where VO0 (VideoObject 0) is the sequence of an entire frame 
which only contains a background and it has no scalability layer. VOl is the sequence of a particular object 
and it has two scalability layers, VOLO and VOL). VOL! represents the same object as VOL0 and it is 
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coded to achieve a higher frame rate than VOLO. In this example, VOLO is regarded as a base layer and 
VOL1 is regarded as an enhancement layer of the OTS. Note that the VOO may not have the same frame 
rate as other VOs. 



VOLl 





4 




'i 0 ^ frame number 


— — 1— j 









Enhancement 
Layer 



VOLO 



12 frame number 



Base Layer 



iFlS!*™ JJJJa) [_: _Enhqncemenlsjructureo£Tyj>e]^ 



Q 2 4 



VOLl 



VOLO 



^ frame myn ber 




Enhancement 
Layer 



Base Layer 



Ir^yjj: 7- 2( b) : Enhancement^ structure o[ Type 1 with B- VOPs. 
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Figure 3.7.3 : Enhancement structure of Type 2. 



There are 2 types of enhancements for scalability, described by the cnhanccment_type flag. We explain 
below the meaning of enhancement_type flag in more detail. As an example, Figure 3.7.4 shows an entire 
image containing several types of regions; for example a road, a car, and mountains. Both the base layer 
with enhancement_type being "0" and the base layer with enh"ancement_rype being "1" are coded with 
lower picture quality which means that either the frame rate is lower or the spatial resolution is lower. At the 
enhancement layer of the scalability, enhancement_type flag distinguishes the following two cases. 

• When this flag is "1", the enhancement layer increases the picture quality of a partial region of the base 
layer. For example, in Figure 3.7.4, VOL0 is an entire frame and VOL1 is the car in the frame. The 
temporal resolution or the spatial resolution of the car is enhanced. 

. • When this flag is "0", the enhancement layer increases the picture quality of the entire region of the base 
layer. For example, in Figure 3.7.4, if VOL0 represents an entire frame, VOL I is also the entire frame. 
Then the temporal or spatial resolution of entire frame is enhanced. If VOL0 represents the car, VOL I is 
also the car which is enhanced in terms of temporal or spatial resolution. 

Note that since only rectangular VOP-based spatial scalability is included in this current version of the VM, 
enhancement_type flag is always set to "0" for spatial scalability. 
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Base layer 


Enhancement layer 


enhancement_type = I 




















VOID : entire frame 




VOL!/: car 




enhanoement_type = 0 




VOID : entire frame 




VOL! : entire frame 


VOID: car 


VOL1 : car 



: region to be enhanced by an enhancement layer 



Figure 3. 7.4 : Example of a region to be enhanced. 

4. Bitstream Syntax 

4.1 General Structure 

The syntax consists of the following class hierarchy: 
• VidcoSession (VS) 
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• VidcoObject(VO) 

• VideoObjectLayer (VOL) 

• VidcoObjectPlanc (VOP) . 

Within the context of video experiments, a VS is a collection of one or more VO's, a VO can consist of one 
or more layers and that each layer consists of an ordered sequence of snapshots in time called VOPs. Thus 
there can be several VO's (VOO, VOL..) in a VS and for each VO, there can be several layers (VOLO, 
VOLI,..) and each layer consists of time sequence of VOPs (VOP0, VOPI.»); which are basically 
snapshots in time. A VO can be of arbitrary shape (rectangular is a special case). For single layered coding 
only one VOL (VOLO) exists per VO. Figure xxx shows the hierarchical structure of the syntax. 

For the purpose of conducting core experiments the bitstreams for the VideoSession and each of the 
VideoObject's.are stored in. separate files. The multiplexing of these bitstreams will" be provided by the 



MSDL. 

VideoSession VSO VS1 

VideoObject VOO^^^^ 

VideoObjectLayer VOLO^^^^^Ll 

--A -A: 

VideoObjedPlane VOPO VOPI VOPO VOPI 

I _JI 

Layer 0 Layer 1 



Figure xxx : Hierarchy in the proposed video syntax 



42 Video Session Class 



Video Session Class 



Syntax 


No. of bits 


Mnemonic 


VideoSession() ( 






video_session_start_code 


sc+8*=32 , 




do*{ 






VideoObjectO 






| while (nextbiisO = video_object_start_code) 






video session end code 

i 


sc+8 = 32 





concurrent loop solution to be provided by MSDL. 



This code is intended to be used in combination with additional bits, for the purpose of the synchronization. 
Its binary representation is 23 zeros followed by a I (OOOOOOOOOOOOtXXKHKJOOOOOl), or its hexadecimal 
representation is '000001'. 

video_session_sta rTcode 
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This code cannot be emulated by any combination of other valid bits in the bitstream, and is used for 
synchronization purpose. Its value is that of the sc followed by BO' in hexadecimal. 

video_session^end_code 

This code cannot be emulated by any combination of other valid bits in the bitstream, and is used for 
synchronization, purpose. Its value is that of the sc followed by - BT in hexadecimal. 
yideo_session_end_code resets all data relative to VOPs. In other words, different sessions are treated 
completely independently. 



4.3. Video Object Class 



Video Object 



Syntax 


No. of bits 


Mnemonic. 


VideoObjectO { 






video_object_start_code 


sc+3=27 




video_object id 


■ 5 




do( 






VideoObjectLayerO 






, | while (nextbitsf) — video_objectjayer start code) 

1 







video_object_start_code 

This is a unique code of length 27 bits (sc+3) that proceeds the video_objecUd. 



video_object__id 

This is a 5-bit code which identifies a.video object in a scene being processed. 
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4.4 Video Object Layer Class 



Video Object Layer 



Syntax 


No. of bits 


Mnemonic •. 


vidcoUojcctLayerO ( 




. ."- .. 


■ vi deo_obJ ect_lay er_sta rt_cod e 


sc+4=28 




.video_oDject_layer_id 


■ 4 




video_object_layer_shape 


2 




if ( video_object_layer_shape = 00 ) { 






video_object_layer_width 


10 




video_object_layer height 

> 

video_object_layer_quant_type 


10 




1 




if (video_object_layer_quant_type) ( 






loadJntra_quant_mat 


1 




if (load_intra_quant_mat) 






intra_quant_mat[64] 


8*64 




load_nonintra_quant_mat 


1 




if (load_nonintra_quant_mal) 






nonintra_quant_mat[64] 

) 

intra_dcpred_disable 


8*64 




1 




video_object_layer_fcode_forward 


2 




video_objecMayer_fcode_backward 


2 




separate_motion_shape„texture 


1 




scalability 


1 




if (scalability) { . 






ref_layer_id 


4 




refjayer_sampling_direc 


1 




hor_samp!ing_factor_n 


. 5 




hor_sampling_factor_m 


.5 




vert_sampling_factor_n 


5 




vert_sampling_factor_m 


'5 




enhancement type 

) . 
do { 


1 








VideoObjectPlaneO 






} while {nextbitsO = video_object_plane_start_code) 

I 







vid eo_o bj ect_l ay e r_sta rt_cod e 

This is a unique code of length 28 bits (sc+4) that proceeds the video_object_layer_id. 



video_object_layer_id 

This is a 4-bit code which identifies a video object layer for a video object being processed. 
video_object_layer_shape 

This is a 2-bit code which identifies the shape type of video object layer as shown in Table Ox. 



Table Ox: Video Object Layer shape types 



video_object_layer_shape 


I Code 


rectangular 


| 00 
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• • • . 



binary 


01 


gray-scale 


10 



This flag is "00" if a VOL shape is rectangular, "01" if a VOL has a binary shape (i.e. if each pixel of the 
rectangle defined by video_objectjayer__width and video _object_layer_height is either part of a VOL or 
not), and "10" if a VOL shape is defined by grey scale data (i.e. if each pixel of the rectangle defined by 
video_object_layer_width and video jobject Jay er_height is to be linearly combined with the pixels of other 
VOLs at the same spatial location). 

video_obj ectjay er_width, \ideo_object_layer_height 

These two codes define the picture size for the session, in pixels unit (zero values are forbidden). This is 
also the size of the unique VOL of the session. 

video_object_layer_quant_type 

A 1-bit code which indicates the type of quantization method selected. When it has a value of 0, H:263 
quantizationmethod is selected, otherwise, MPEG- 1/2 quantization method is selected. 

load_intra_quant_mat 

A 1-bit code which indicates whether the default matrix for visually weighting DCT coefficients of intra 
macroblocks is selected or if a new matrix for visually weighting of DCT coefficients of intra macroblocks 
is to be loaded. 

intra_quant_mat[64] 

This is an one dimensional (Id) array of 64 values (8-bits per value expressed in range I to 255) for visual 
weighting of DCT coefficients of intra macroblocks. The values in the Id array are in the same order as that 
obtained by zig-zag scanning of a 8x8 two-dimensional array. In addition, the first value for 
intra_quant_mat[64] should always be 8. 

load„nomntra_quant_mat 

A 1 bit code which indicates whether the default matrix for visually weighting DCT coefficients of nonintra 
macroblocks is selected or if a new matrix for visually weighting of DCT coefficients of nonintra 
macroblocks is to be loaded. 

nonintra_quant_mat[64] 

This is an one dimensional (Id) array of 64 values (8-bits per value expressed in range 0 to 255) for visual 
weighting of DCT coefficients of nonintra macroblocks. The values in the Id array are in the same order as 
that obtained by zig-zag scanning of a 8x8 two-dimensional array, 

intradc_pred_disable 

A 1 bit code whose value is 1 when DC prediction of intra coded blocks is to be disabled. In default mode, 
dc prediction of intra macroblocks is enabled. 

video_objectjayer_fcode_forward, video_object_layer_fcode_backward 

These are 2-bit codes that specify the dynamic range of motion vectors. 

separate_motion_shape_texture 

This flag is H l" if all the coding data (e.g. motion, shape, texture, etc) for the VOP are grouped together. It 
is "0" if the coding data are grouped macroblock per macroblock. 

scalability 

This is a I -bit flag which indicates if the current layer uses scalable coding. If the current layer is used as 
the base-layer, this flag is '0'. 

ref_layer_ld 
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This is' a 4-bit code which indicates the layer to be used as reference for the prediction(s) in the case of 
scalability. It can have a value between 0 and 15. 

ref_layer_sampling_direc 

This is a 1-bit flag whose value when "0" indicates that the reference layer specified by ref_layer_id has the. . 
. same or lower resolution as the layer being coded. Alternatively, a value of "1" indicates that the resolution 
of reference layer is higher than the resolution of layer being coded resolution. 

hor_sainpUngJactor_n, hor_sampling_factor_m 

these are 5-bit quantities in range 1 to 31 whose ratio hor_sampling_factOT_n/hor_sampling_factor_in 
indicates the resampling needed in horizontal direction; the direction of sampling is indicated by . 
ref_layer_sampling_direc;; . " 

vert_sampliog_factor_n, vert_sampIing_factor_m 

These are 5-bit quantities in range of 1 to 31 whose ratio vert_sampling_factor_n/vert 1 .sampling_factor^m 
indicates the resampling needed in vertical direction; the direction of sampling is indicated by 
'ref_layer_sampling_direc. 

enha n cement_ty pe 

This is a 1-bit flag which indicates the type of an enhancement structure in a scalability. It has a value of"!" 
when an enhancement layer enhances a partial region of the base layer. It has a value of "0" when an 
enhancement layer enhances entire region of the base layer. The default value of this flag is "0". 
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4.5 VideoObjectPlane Class 



VideoObjectPlane 



Syntax 



No. of bits 



Mnemonic 



VideoObjectPlaneO i 
VOP_start__code 
do { 

moduJo_Ume_base 
} while ( modulo_time_base != "0") 
VQP_time_increment 
VOP_prediction_type 
if (video_objectJayer_shape != "00") { 

VOP_width 

VOP_height 

VOP_horizontal_mc_spatial_ref 
marker_bft 

VOP_verticaLmc_spatiaI_rcf 

if (scalability && enhancement_type) 
background_composition 

} 

if ( VOP_p^ediction_type= , 1 0') 
VOP_dbquant 

else 

VOP__quant 
if (iscalability) { 

if (!separate_motion_shape_texture) 

combined_motion_shape„texture_coding() 
else{ 
do{ 

first_MMR_code 
) while (count of macroblocks != total number of macroblocks) 
motion_coding() 



sc+8=32 

1 

10 
2 

10 
10 
10 

1 

10 



i 



shape_codingO 
texture_coding() 



1 

else { 

if (background_composition) { 
load_backward_shape 
if (load_backward_shape) { 
back ward_shape_codi ng() 
load_forward_shape 
if (load_forward_shape) 
forward_sbape_coding() 

J 

i 

ref_select_code 

if (VOP__prediction_type= "01" P VOP_prediction_type*= "10") i 

forward_temporal_ref 
. if (VOP_4)redicuon_type ~ "10") { 
marker_Wl 

backward_temporal_ref 

} 

} 



1-2 



2 
10 

1 

10 
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combincd_moiion_shapc_texture_codingO 

) 

J : — : — 



VOP_slart_code 

This code cannot be emulated by any combination of other valid bits in the bitstream, and is used for 
synchronization purpose. It is sc followed by a unique 8-bit code. 

modulo_time_base 

This value represents the local time base at the one second resolution unit (1000 milliseconds). It is 
represented as a marker transmitted in the VOP header. The number of consecutive "1" followed by a "0" t. 
indicates the number of seconds has elapsed since the synchronisation point marked by the last 
encoded/decoded modu!o_time_base- 



VOP_time_mcrement 

This value represents the local time base in the units of milliseconds. For I and P-VOP's this value is the 
absolute VOP.timeJncrement from the synchronisation point marked by the last modulo_time_base. For 
the B-VOP's this value is the relative VOP_time_increment from the last encoded/decoded I- or P-VOP. 



Y modulo_time_base 
jx.x | VOP_time_increment 




local lime base 



To produce a picture at a given time (according to the display frame rate), the simplest solution is to use the 
most recently decoded data of each VOP to be displayed. Another possibility, more complex and for non 
real time applications, could be to interpolate each VOP from its two occurences temporally surrounding 
the needed instant, based on their temporal references. 

VOPj)redlction_type 

This code indicates the prediction mode to be used for decoding the VOP as shown in Table Ix. 



Table lx: VOP prediction types 
VOP_prediction_type H Code 
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I 


00 


p 


01 


B 


10 



VOP_width, VOPJieight 

These two numerical values define the size of the smallest rectangle that includes the VOP, in pixels unit 
(zero values are forbidden). 

VOP_horizonta l_mc_spa tiaLref, VOP„vertica l_mc_spatial_ref 

These values are used for decoding and for picture composition. They indicate the spatial position of the top 
left of the rectangle defined by VOP_width and VOPJieight, in pixels unit, the origin of coordinates being 
the top-left corner of the picture. The corresponding values are binary represented as positive integers. 
This is a script information, that may be changed under user request. 

marker_bit 

This is a single bit always set to '1 ' in order to avoid start code emulation. 
background_composition 

This flag only occurs when scalability flag has a value of "1". The default value of this flag is "0". This flag 
is used in conjunction with enhancement_type flag. If enhancement_type is "1" and this flag is l T\ 
background composition is performed. If enhancement type is "1" and this flag is "0", background is 
repeated from the nearest frame in base layer. Further, if enhancement type is ."0" no action needs to be 
taken as a consequence of any value of this flag. 

shape_coding() 

The shape_coding() function generates the format of the coded data of a current shape (alpha plane). 
load_backward_shape 

If this flag is "1", backwarcLshape of the previous VOP is copied to forward_shape for the current VOP 
and backward_shape for the current VOP is decoded from the bitstream. If not,.forward_shape for the 
previous VOP is copied to forward_shape for the current VOP and backward_shape for the previous VOP 
is copied to backward_shape for the current VOP. 

backward_shape_codirig() 

It specifies the format of coded data for backward_shape and is identical to that of shape_coding{). 
load_forward_shape 

This flag is "1" if forward_shape will be decoded from a bitstream. 
forward_$hape_codingO 

It specifies the format of coded data for forward_shape and is identical to that of shape jcoding(). 
ref_select_code 

This is a 2-bit code which indicates prediction reference choices for P- and B-VOPs in the enhancement 
layer with respect to decoded reference layer identified by refjayerjd. 

forward_temporaL>*ef 

An unsigned integer value which indicates temporal reference of the decoded reference layer VOP to be 
used for forward prediction (Table 5.3.1 and 5.3.2) 

backwa rd_temporal_ref 

An unsigned integer value which indicates temporal reference of the decoded reference layer VOP to be 
used for backward prediction (Table 5.3.2), 
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VOP.dbquant 

VOP_dbquant is present if VOP_prediction_type indicates VOP.jnedictKMutype^lO'. dquant ranges, 
from 1 to 31. VOP_dbquant is a 2-bit fixed length code that indicates the relationship between quant and 
bquant. Depending on the value of VOP_dbquant, bquant is calculated according to the relationship shown 
■ in Table 2x and is clipped to lie in the range ] to 3 1 . In this table T means truncation.. 

Table 2x: VOP_dbquant codes and relation between quant and bquant 



dbquant 


bquant 


00 . 


(5xquant)/4 


01 


(6xquant)/4 


10 • - 


(7xquant)/4 


11 


(8xquant)/4 



VOP.quant 

A fixed length codeword of 5 bits which indicates the quantizer to be used for VOP until updated by any 
subsequent value DQUANT. The codewords are natural binary representations of the value of quantization 
which being half the step sizes range from 1 to 31. 

first_MMR_code For I-VOP:W indicates that the subsequent alpha data exist (multilevel). *or indicates 
that all samples in the alpha block are "0" and T indicates that all samples are "255". For P-BOP and B- 
VOP: '00" indicates that the subsequent alpha data exist (multilevel). '01' indicates that all samples in the 
alpha blocck are u l" and *10'. indicates that alt samples are "255". Ml' indicates that alpha data is inter- . 
coded.. 



4.6 Shape coding 
shape. coding 
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Syntax 



No. of bits 



Mnemonic 



shape_coding() { 

. if (video_object_shape != W)*) | 
do{ 

if (first_MMR_codc='W) { 
CR 

aO.color 
do{ 

VLC_binary 
if( Mode^=H ) { 
if( VerticaI_pass_mode = TRUE ) { 

RLB 
} else ( 
ULB 

1 

| while( Mode !=* EOMB ) 

} 

] while (count of macroblock != total number of macroblocks) 
if ( video_object_shape = ' 1 0") 
do{ 

Gray_shape_coding() 
} while (count of macroblock != total number of macroblocks) 



1-2 
1 

1-9 



2-4 
2-4 



CR - Conversion ratio described in section 3.2. 1. The codeword table is shown in table M 1 . 
Table Ml VLCforCR 



CR 


Code 


1 


0 


1/2 


10 


1/4 


11 



a0_color - A 1 bit code indicating the color of the first pixel in a MB (0:white, I :black). 
VLCJ>inary - Variable length code for binary shape information shown in Table M2. 

Table M2 VLC table for Modified MMR 



60 



Mode 




Code 


V(KDIST=0) 




1 


V1(DIST=1) 




01s 


V2(D1ST=2) 




00001s 


V3(D1ST=3) 




000001s 


V4<D1ST=^4) 




0000001s 


V5(D1ST=5) 




00000001s 


H 




001 


EOMB 




0001 - 



V0 - V5 : Vertical mode 

H : Horizontal mode 

EOMB: End of MB 

DIST : absolute value of (r_al - r_bl) 

s : sign bit (s=l if r_al-r_bl>0, and s=0 if r_al-r_bl<0) 

Vertical_pass_mode — TRUE=The mode is Vertical Pass mode. 

FALSE=The mode is not Vertical Pass mode (it means 
Horizontal mode or Vertical mode). 

RLB -Residual-length of binary shape information. 

ULB - Unchanged-length of binary shape information. 
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4.7 Motion Shape Texture 



4.7.1 Combined Motion Shape Texture 

The motion shape texture coding method used for I-, P- and B-VOPs is described in Apendix B. The 
advanced prediction mode and overlapping motion estimation are also described in that Appendix. The 
macroblock layer syntax for each coded macroblock consists of macroblock header information which also 
includes motion vectors, and, block data which consists of DCT data (coded texture information). 

4.7.2 Separate Motion Shape Texture for /- and P-VOPs 
motion.coding 

The syntax for the encoded motion vectors of macroblocks that belong to the VOP would be: 



1 -2 


N bits 


1-2 


N bits 


No. of Vectors 


Encoded vector ... 


No. of Vectors 


Encoded vector .... 





No. of Vectprs : 



Huffman coded number of motion vectors for each macroblock (0, 1 , or 4). 

A '0* indicates that there is no motion compensation for macroblock. Hence the data coded by the texture 
coding part will be the actual pixel values in the current image at this macroblock location (INTRA coded) 
or skipped by the texture coding syntax. A T indicates that this macroblock is motion compensated by a 16 
x 16 motion vector. Similarly a '4' indicates that this macroblock is motion compensated by 4', 8 x 8 motion 
vectors. 



The Huffman codes used to code this No. of Vectors field are 



Value 


Length 


Code 


0 


2 


11 


1 


1 


0 


4 


2 


10 



Encoded vector: 



These are coded differentially coded using the same prediction scheme and Huffman tables, as described in 
in sections 3.3.2.6 to 3.3.2.8. 

shape_codfng 

As in section 4.6. 

texture_coding 

The texture data for the macroblocks belonging to the VOP is coded using a DCT coding as in section 3.4. 
The syntax of the texture coding for each of the macroblocks in the VOP is as follows: 



1 2-6 2-3 N bits 
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IdCT/NQ DCTflagl CBPY IcBPC | DCTdata I 



DCT/NQ DCT flag: 

This flag indicates whether a macroblock had DCT data or not If it has DCT data. The Huffman Codes . 
used to code the DCIYNO_DCT flag are: 



Value 


Length 


Code 


DCT 


1 


0 


NO_DCT 


1 


1 



To skip a macroblock, No. Of Motion Vectors is set to W and the DCT/NO_DCT flag to '1*. 



CBPY : 

Coded Block Pattern Luminance, uses the same Huffman tables as in appendix B. 
CBPC : 



Coded Block Pattern Chrominance, uses the following Huffman table 



Value 


. Length 


Code 


00. 


1 


1 


01 


3 


001 


10 


3 


010 


11 


3' 


011 



PCTdata: 

DCT encoded macroblock data using the same 3D VLC as in appendix. B. 



4.7 S Separate Motion Shape Texture for B-VOPs 
motion_coding 

The syntax for the encoded macroblock overhead information and motion vectors of rriacroblocks that 
belong to B-VOP is: 

MB header header ... MB hejader MB header | 

As defined earlier in section 4.7.1, the MB header consists of MODB, MBTYPE, CBPB, DQUANT and 
any associated motion vector/s (as indicated by MBTYPE). 

shape.coding 

As in section 4.6 

texture_coding 
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) 



The syntax for the encoded DCT coefficients representing texture data is as follows: 



Block Data 



.. Block Data Blotk Data 



Again as defined in section 4.7.1, the Block data simply consists of DCT coefficients. 

5. Decoder Definition 
5.1 Overview 



The Figure 5J.1 presents a general overview of the VOP decoder structure. The same decoding scheme is 
applied when decoding all the VOPs of a given session. 



Demultiplexer 



Shape Decoding 



Motion 
Decoding 



Texture 
Decoding 



VOP Memory 



▼ 


* 


Motion 
Compensation 




Reconstuctcd 




VOP 



Figure 5.1.1 : VOP decoder structure. 

The decoder is mainly componed of two parts : the shape decoder and the traditional motion & texture 
decoder. The reconstructed VOP is obtained by the right combination of the shape, texture and motion 
information. 



5.2 Shape decoding 

Decoding process is described below. 

abs_aO-0 
DECODE aO_color 
DETECT bl 
if(bl is detected ) 

Vertical_pass_mode = FALSE 
else 
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/ 



Vertical_pass_mode = TRUE 
do{ 

" if(Modc — H) { 

if( Vertical _pass_mode == FALSE ) | . . 

( see 3.2.1.3.1 ) 
DECODE ULB 

if( ULB = WIDTH ) Vertical_pass_mode = TRUE 
else abs_aO = abs_al 

) else { /* Vertical_pass_mode = TRUE */ 
( see 3.2.1 J.2 ) 
DECODE RLB 
Vertical_pass_mode = FALSE 
abs_aO = abs_al 

) 

} 

else if ( Mode = V0){ 
ifl Vertical_pass_mode « TRUE ) { ( see 3.2.13.2 ) } 
else{ DETECT bl 
. r_al=r_bl 
abs_aO ~ abs_al } 

} 

else if ( Mode = VI.) [ 
DETECT bl 
D1ST=1 

if ( s = 0 ) r_al = r_bl - DIST 
else . r_al = r_bl + DIST 
abs_aO = abs_al 



} . 

else if ( Mode = V5 ) ( 
DETECT bl 
D1ST=5 

if { s — 0 ) r_al ~ r_bl *- DIST 
else r_al = r_b1 + DIST 
abs_aO = abs_al 

) 

) while(Mode!=EOMB) 
53 Generalized Scalable Decoding 

We now discuss the decoding issues in generalized scalable decoding. Considering the case of two layers, a 
base-layer and an enhancement-layer, the spatial resolution of each layer may be either the same or 
different; when the layers have different spatial resolution, (up or down) sampling of base-layer with respect 
to the enhancement-layer becomes necessary for generating predictions. If the lower layer and the 
enhancement -layer are temporally offset, irrespective of the spatial resolutions, motion compensated 
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prediction may be used between layers. When the layers are coincident in time but at different resolution, 
motion compensation may be switched off to reduce overhead. 

The reference VOPs for prediction are selected by reference_select_code as specified. in Tables 5.3.1 and 
5.3.2. In coding of P-VOPs belonging to an enhancement layer, the forward reference is one of the 
following three: the most recent decoded VOP of enhancement layer, the most recent VOP of the lower 
layer in display order, or the next VOP of the lower layer in display order. 

In B-VOPs, the forward reference is one of the two: the most recent decoded enhancement VOP or the most 
recent lower layer VOP in display order. The backward reference is one of the three: the temporally 
coincident VOP in the lower layer, the most recent lower layer VOP in display order, or the next lower 
layer. VOP in display order. 

Table 5.5. ; : Prediction reference choices for P-VOPs in the object-based temporal scalability. 



ref_select_code |j forward prediction reference 


00 


Most recent decoded enhancement VOP 
belonging to the same layer. 


01 


Most recent VOP in display order belonging 
to the reference layer. 


10 


Next VOP in display order belonging to the 
reference layer.- 


1 1 I Temporally coincident VOP in the reference 
il layer (no motion vectors) 



Table 5.3.2 : Prediction reference choices for B-VOPs in the case of scalability. 



ref_select_code || forward temporal reference 


backward temporal reference 


00 


Temporally coincident VOP in the 
reference layer (no motion vectors) 


Most recent decoded enhancement VOP 
of the same layer 


01 


Most recent decoded enhancement VOP 
of the same layer. 


Most recent VOP in display order 
belonging to the reference layer. 


10 


Most recent decoded enhancement VOP 
of the same layer. 


Next VOP in display order belonging to 
the reference layer. 


11 


Most recent VOP in display order 
belonging to the reference layer. 


Next VOP in display order belonging to 
the reference layer! 



. The enhancement-layer can contain 1-, P- or B-VOPs, but the B-VOPs in the enhancement layer behave 
more like P-VOPs at least in the sense that a decoded B-VOP can be used to predict the followine P- or B- 
VOPs. 



When the most recent VOP in the base layer is used as reference, this includes the VOP that is temporally 
coincident with the VOP in the enhancement layer. However, this necessitates use of the base layer for 
motion compensation which requires motion vectors. 

If the coincident VOP in the lower layer is used explicitly as reference, no motion vectors are sent and this 
mode can be used to provide spatial scalability. Spatial scalability in MPEG-2 uses spatio-temporal 
prediction, which is accomplished here by using the prediction modes available for B-VOPs. 

5.3 J Spatial Scalability Decoding 

5.3.1.1. Base Layer and Enhancement Layer 
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For spatial scalability, the output from decoding the base layer only have different spatial resolution from 
output of decoding both the base layer and the enhancement layer. For example, the resolution of the base 
layer is QC1F resolution and that of the enhancement layer is OF resolution. In this case, when the output 
with QCIF resolution is required, only the base layer is decoded. And when the output with CIF resolution 
is required, both the base layer and the enhancement layer are decoded. 

5.3.1.2. Decoding of Base Layer 

The decoding process of the base layer is the same as non_scalable decoding process. 

5.3.1.3. Upsampfing Process 

The upsampling process is performed at the midprocessor. The VOP of the base layer is locally decoded I . 
and! the decoded VOP is upsampled to the same resolution as that of the enhancement layer. In case of the 
example above, upsampling is performed by the filtering process described in Figure 3.7.1.1 and Table 
3.7.1.1. 

5.3.1.4. Decoding Process of Enhancement Layer 

The VOP in the enhancement layer is decoded as either P-VOP or B-VOP. 

. 5.3.1.5. Decoding of P-VOPs in Enhancement Layer 

In P-VOP, the ref_select_code is always. "1 1", i.e., the prediction reference is set to I-VOP which is 
temporally coincident VOP in the base layer. In P-VOP, the motion vector is always set to 0 at the decoding 
process. 

5.3.1.6. Decoding of B-VOPs in Enhancement Layer 

In B-VOP, the rcf_select_code is always "00", i.e., the forward prediction reference is set to P-VOP which 
is temporally coincident VOP in the base layer, and the backward prediction reference is set to P-VOP or B- 
VOP which is. the most recent decoded VOP of the enhancement layer. In B-VOP, when the forward 
prediction, i.e., the prediction from the base layer is selected, the motion vector is always set to 0 at the 
decoding process. 

5.3.2 Temporal Scalability Decoding 

In object based temporal scalability, a background composition technique is used in the case of Type 1 
scalability as discussed in Section 3.7.2. Background composition is used in forming the background 
region for objects at the enhancement layer. We now describe the background composition technique 
referring to Figure 5.3.1, where background composition for a current VOP is depicted, where 
composition is performed using the previous and the next pictures in the base layer (e.g., the background 
region for the VOP at frame 2 in Figure 3.7.2 is composed using frames 0 and 6 in the base layer). 

In Figure 5.3.1, we show the background composition for the current frame at the enhancement layer. The 
dotted line represents the shape of the selected object at the previous frame in the base layer (called 
"forward shape"). As the object moves, its shape at the next frame in the base layer is represented by a 
broken line (called "backward shape"). For the region where the areas enclosed by these shapes overlap, the 
pixel value from the nearest frame at the base layer is used for the composed frame. Similarly, outside these 
objects, the pixel value from the nearest frame at the base layer is used. These areas are shown as white in 
Figure 5 3. 1 . For the region occupied by only the selected object of the previous frame at the base layer, the 
pixel value from the next frame at the base layer is used for the. composed frame. This area is shown as 
lightly shaded in Figure 5.3.1 . On the other hand, for the region occupied by only the selected object of the 
next frame at the base layer, pixel values from the previous frame are used. This area is darkly shaded in 
Figure 5.3.1. 
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shape of the selected 
object at the previous 
frame : 

"forward shape"" 



pixel value from the 
next frame 




shape of the selected 
object at the next 
frame : 

"backward shape" 



pixel value from 
the previous fame 



pixel value from the nearest 
frame (the previous or the 
next) 



Figure 5.3. J: Background composition. 
The following process is a mathematical description of the background composition method. 



If ( six, y, ta)= 1 and six, y, td)~ 1 ) or ( j(jc, y, ta)=0 and six, y, fc/)=0 ), 




Mx,y,t)=J{x,y,td) (\Hal>\Mdl) 




Mx,y,t)=jix f y,ia) (otherwise), 


if six, y, w)=l and six, y, td)=Q, 




Mx t y,t)=f(x,y,td) , 


if s{x, y, /o)=0 and six, y, td)= 1 , 




fcix,y,t)=jix,y,ta) , 


where 




fc 


: composed image 


f 


: decoded image of the base layer 


s 


: shape information(alpha plane) 


ix,y) 


: the spatial coordinate 


t 


: time of the current frame 


ta 


: time of the previous frame 


td 


: time of the next frame 



Two types of shape information, s{x, y, ta) and six, y, td), are necessary for the background composition. 
six, y, ta) is called a "forward shape" and six, y, td) is called a "backward shape" in Section 4.4. When a 
gray scale alpha plane is used, positive value is regarded as the value "I" of the binary alpha plane. Note 
that the above technique is based on the assumption that the background is not moving. 
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|s^4 _ _ Comrjositer Definition 

The output of the decoders are the reconstructed VQP's that are passed to the compositor. In the 
compositor the VOP's are recursively blended in the order specified by the VOP_composition_order. 

Each VOP has its own YUV and alpha values. Blending is done sequentially ( two layers at a time). For 
example, if VOP N is overfayered over VOP M to generate a new VOP P t the composited Y, U, V and 
alpha values are: 

Pyuv = ((255 - Nalpha) * Myuv + (Nalpha * Nyuv ))/255; 

Palpha = 255. 

In the case that there exists more than two VOPs for a particular sequence, this blending procedure 
recursively applies to YUV components by taking the output picture as background. 

5.5 Flex_0 Composition Layer Syntax 

A composition script describes the arrangement of AV objects in a scene; in Flex_l, this composition 
script is expressed in a procedural language (such as Java). In Flex_0, this composition script is expressed 
instead by a fixed set of parameters. That is, the composition script is parametrized. These parameters are 
encoded and transmitted in a composition layer. This section briefly describes the bitstream syntax of the 
. composilon layer for Flex_0. 



5.5.1 Bitstream Syntax 

At any given time, a scene is composed of a collection of objects, according to composition parameters. 
The composition parameters for an object may be changed at any time by transmitting new parameters for • 
the object. These parameters are timestamped, in order to be transmitted only occasionally if desired. 

class SessionParameters { 

while (! [end_of_session] ). 

uint{30) timestamp; // millisec since last update 
Compos itionlnf ormation composition_inf ormation; 

} 

\> 

map mot ion_sets_t able ( int ) { 
ObO, 0, 
OblO, 1, 
ObllO, 2 
OblllO, 3, 
Obllll, 4 

} ; 

class Compositionlnf ormation { 
uint { 5 ) video_ob j ect_id ; 
bit(l) visibility; 
if (visibility) { 
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) 



bit(l) 3_dimensional; 
if ( ! 3_dimensional) 

uint(5) compos ition_order, * 
vie (motiori_sets_table) number^_of_motion_sets; 
if (number_of_motion_sets > 0) { 

int(10) x_translation; 

int{10). y„translation; 

if (3_dimensional) 

int(10) z_translation; 

} 

if (number_of_motion_sets > 1) { . 

int{10) x_delta_l; 

int(10) y_delta_l; 

if. (3_diniensional) 

int(10) z_delta„l; 
if (number_of_motion_sets > 2) { 

int(10) x_delta_2; 

int{10) y_delta_2; 

if (3_dimensional) 

int(10) z_delta_2; 

} 

if (number_of_motion_sets > 3) { 
int{10) x_delta_3; 
int(10) y_delta_3; 
if (3_dimensional) 

int(10) z_delta_3; 

} 

} 

}; 



5.5.2 Parameter Semantics 
The meaning of the above parameters are: 

mrit(5) video__objectJd 
The ID of the VO whose composition information this data represents. 

boolean visibility 

Set if the object is visible. 

boolean 3_dimensionaI 

If set the object has 3D extent else it purely is 2D. 
uint(3) number_of_motion_sets 

Number of X,Y (and Z) data sets provided for translation and rotation. 
uint(S) compositjon_order 

This field is used to indicate the place in the object stack this object should be visualised. It is used to 
determine which objects occlude which other objects for 2D composition. 
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int(lO) x_translation, yitranslation, z_translation 

Translation of the object relative to the origin of the scene coordinate system. 

int(10) i_delta_n, y_delta_D, z_delta_n 

Coordinate transformation information as per input contribution ml 1 19.doc 
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Appendix A 
Combined Motion Shape Texture Coding 

This appendix describes the combined motion shape texture coding. The text is based on Draft ITU-T 
Recommendation H.263 Video Coding for Low Bitrate Communication sections 5.3 and 5.4. Annexes 
describing optional modes can be found in that document. 



B.l. Macroblock Layer 

Data for each macroblock consists of a macroblock header followed by data for blocks. The macroblock 
layer structure in I or P VOPs is shown in Figure B.2(a). First_MMR_code is only present for which VOP 
of arbitrary shape is '1 COD is only present in VOPs for which VOP_prediction_type indicates P-VOPs 
(VOP_prediction_type= '01'. MCBPC is present when indicated by COD or when VOPjirediction.type 
indicates I-VOP (VOP_prediction„type= '00')- CBPY, DQUANT, and MVD 2J , are present when 
indicated by MCBPC. Block Data is present when indicated by MCBPC and CBPY. MVD 2j( are only 
present in Advanced Prediction mode. CR, a0_color, VLC_binary, RLB, ULB are only present when 
first_MMR_code indicates multilevel (see sec. 4.6). 



first_MMR_code 




COD 


MCBPC 1 CBPY 


DQUANT 


MVD 


MVD 2 1 MVD 3 


MVD 4 | 


CR 


afJLcolor 


VLC_ binary 


RLB/ULB 


CODA | CBPA 


Alpha.Block Data 


Block Data 



Figure B.2(a) Structure of macroblock layer in I- and P-VOPs 

The macroblock layer structure in B-VOPs (VOP_predictionjype= '10') is shown in Figure B.2(b). If 
COD indicates skipped (COD = ']') for a MB in the most recently decoded I- or P- VOP then the co- 
located MB in B-VOP is also skipped (no information is included in the bitstream). Otherwise, the 
macroblock layer is as shown in Figure B.2(b). 



first_MMR_codc 




MODB 1 MBTYPE 


CBPB 


DQUANT 


MVD f | MVD,, | MVDB | CR | 


aO_color | VLC_binary | RLB/ULB N 


COD | MODBA |CBPBA | A BlockData | Block Data 1 



Figure B.2(b) Structure of macroblock layer in B VOPs 



MODB is present for every coded (non-skipped) macroblock in B-VOP. MVD's (MVD r , MVD* or 
MVDB) and CBPB are present if indicated by MODB, Macroblock type is indicated by MBTYPE which 
signals motion vector modes (MVD's) and quantization (DQUANT). 
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B.J.l Coded macroblock indication (COD) (1 bit) 

A bit which when set to "0" signals that the macroblock is coded. If set to "1\ no further information is 
transmitted for this macroblock; in that case the decoder shall treat the macroblock as a *P' macroblock with 
motion vector for the whole block equal to zero and with no coefficient data. COD is only present in VOPs 
for which VOP_predicrion_type indicates *P\ for each macroblock in these VOPs. 

B.1.2 Macroblock type & Coded block pattern for chrominance (MCBPC) (Variable 
length) 

A variable length codeword giving information about the macroblock type and the coded block pattern for 
chrominance. The codewords for MCBPC are given in Table B.l and Table B.2. MCBPC is always 
included in coded macroblocks. . 

An extra codeword is available in the tables for bit stuffing. This codeword should be discarded by 
decoders. 

The macroblock type gives information about the macroblock and which data elements are present. 
Macroblock types and included elements are listed in Table B.3 and Table B.4. 



Table B.l 

VLC table for MCBPC (for 1-VOPs) 



Index 


. MB type 


CBPC 
(56) 


Number of 
bits 


Code 


0 


3 


00 .. 


1 


* 1 


1 


3 


01 


3 


001 


2 


. 3 


ib 


3 


010 


3. 


3 


li 


' 3 


011- 


' 4 


4 


00 


■ 4 

- 


0001 


5 


4 


01 


6 


0000 01 


6 


4 


10 


6 


0000 10 


7 


4 


11 


6 


0000 11 . 


8 


Stuffing 




9 


0000 0000 1 



The coded block pattern for chrominance signifies C B and/or Cr blocks when at least one non-INTRADC 
transform coefficient is transmitted (INTRADC is the dc-coefficient for T blocks). CBPC M *= 1 if any non- 
INTRADC coefficient is present for block N, else 0, for CBPQ and CBPQ in the coded block pattern; 
Block, numbering is given in Figure B.3. When MCBPG=Stuffing, the remaining part of the macroblock 
layer is skipped. In this case, the proceeding COD=0 is not related to any coded or not-coded macroblock 
and therefore the macroblock number is not incremented. For P-VOPs, multiple stuffings are accomplished 
by multiple sets of COD=* and MCBPO*Stuffing. 
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Figure B.3 



Arrangement of blocks in a macroblock 



Table 

VLC table for MCBPC (for P-VOPs) 



Index 


MB lype 


CBPC 
(56) 


Number of 
bits 


Code 


0 . 


0 


00 


■ 1 


1 


1 


o 


01 


4 


0011 


2 


0 


.10 


4 


0010 


3 


0 


11. 


6 ■ 


0001 01 


4 


. 1 


00 


3 


011 


5 


1 


01 


. 7 


oooo in 


6 


1 


10 


7 


0000 110 


7 


1 


11 


.9 


0000 0010 1 


8 


2 


00 


3 


010 


9 , 


2 


01 


7 


0000 101 


10 


2 


10 


7 


0000 100 


11 


2 


11 


8 : 


0000 0101 


12 


■3 


00 


5 


0001 1 


13 I 


. 3 


01 


8 


. 0000 0100 
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14 


3 


10 


8 


0000 0011 




15 




11 


7 . 


OOOO Oil 




16 


4 


00 


6 


0001 00 




17 


A 




Q 

y 


n n fin nm n 




18 


4 


10 


9- 


0000 0001 


1 


19 


4 


11 


9 


0000 0001 


0 


20 


Stuffing 




9 


0000 0000 


1 



Table B.3 

Mac rob lock types and included data elements for a VOP 



VOPtypc 


MB type 


Name 


COD 


MCBPC 


CBPY 


DQUANT 


MVD 


MVDm 


P 


not coded' 




X 












P 


0 . 


INTER 


X 


X 


X 




X 




P 


1 


INTER+Q 


X 


X 


X 


X 


X. , 




P 


2 


INTER4V 


X 


X 


X 




X 


X 


P 


3 


INTRA 


X 


X 


X 








. P 


4 


INTRA+Q 


X 


X 


X 


X 






P 


stuffing 




X 


X 










I 


3. 


INTRA 




X 


X 








I 


4 


INTRA+Q 




X 


X 


X 








stuffing 






X 











Note: "x" means that the item is present in the macroblock 



B.1.4 Coded block pattern for luminance (CBPY) (Variable length) 

Variable length codeword giving a pattern number signifying those Y blocks in the macroblock for which at 

least one non-INTRADC transform coefficient is transmitted (1NTRADC is the dc-coefficient for INTRA 

blocks. 

CBPY N = 1 if any non-INTRADC coefficient is present for block N, else 0, for each bit CBPY N in the 
coded block pattern. Block numbering is given in Figure B.3, the utmost left bit of CBPY corresponding 
with block number I. For a certain pattern CBPY N , different codewords are used for INTER and INTRA 
macroblocks as defined in Table B.S. 
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B.1.5 Quantizer Information (DQUANT) (1 or 2 bits) 
. A one or two bit code to define change in VOP_quantizer. In Table B.4A and B the differential values for 
the different codewords are given. VOP_quantizer ranges from 1 to 31 ; if the value for VOP_quantizer after 
adding the differential value is less than I or greater than 31, it is clipped to 1 and 31 respectively. Note that 
the DQUANT can take values, -2, 0, or 2 in the case of B-VOPs and these values are coded differently from 
other VOP types. 



Table B.4A 

DQUANT codes and differential values for VOP_quantizer 



Index 


Differential 
value 


DQUANT 


0 


-1 


00 


1 


-2 


01 


2 


1 


■ 10 


3 ' I 


2 


11 



Table B.4B 

DQUANT codes and differential values for VOP_quantizer for B-VOPs 



Index . 


| Differential, 
value 


DQUANT 


0 


-2 


10 


1 


0 


0 


2 


2 


11 



16 



Table 



. VLG table for CBPY 





1 CBPY(I) 


CBPY(P) 


Number 




index 


1 fit 
(12 


(12 








1 34) 


34). 


ot bits 










- 00 


■11 






0 






4 


□ Oil 




00 


11 








00 


ll 






1 

1 






j 


nrn n 1 

VIU X U J. 




01 


10 . 








00 


11 






z 






c 

■ 3 


UU1U u 




10 


01 








00 


11 






■ -j 






A 






11 


■ op 








01 


10 






4 






5 


0001 1 




00 


. .11 








01 


10 






5 






4 . 


. 0111 ■ 




01 


10 






. 


01 


10 






6 






6 


0000. 10 




10 


01 








01. 


10 






7 






4 


1011 




11 


00 








10 


' 01 






8 






. 5 


0001 .0 




00 


11 








10 


01 






9 






6 


0000 11 . 




01 


10 








10 ■ 


01 






10 






4 


0101 




10 


. 01 








10 


01 






11 






4 


1010 




11 


00 








11 


00 


















.00 


" 11 








11 


00 






13 








1000 




■ 01 


10 








11 


00 






. a 








0110 


.10 


01 








11 


00 






15 






2 - 


11 




11 


00 







BJ.6 Motion Vector Coding 

Motion vectors for predicted and interpolated VOPs are coded differentially within a row of macroblocks, 
obeying the following rules: 
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In P-VOPs, differential motion vectors are generated as per section on differential coding of Motion 
vectors (Sec. 3.3.2.6). However, the differential motion vectors are coded as in this section. 

- In B-VOPs, every forward or backward motion vector is coded relative to the last vector of the same 
type. Each component of the vector is coded independently, the horizontal component first and then 
the vertical component. The prediction motion vector is set to zero in the macroblocks at the start of 
a row of macroblocks. If a previous macroblock is skipped prediction for current macroblock is reset 
to zero. 

- In B-VOPs, only vectors that are used for the selected prediction mode (MB type) are coded. On(y 
vectors that have been coded are used as prediction motion vectors. 



The VLC used to encode the differential motion vector data depends upon the range of the vectors. The 
maximum range that can be represented is determined by the forward_f_code and backward_f_code 
encoded in the VOP header. 

The differential motion vector component is calculated. Its range is compared with the values given in. 
Table B.6 and is reduced to fall in the correct range by the following algorithm: 

if (diff_yector < -range) 

diff_vector = diff_vector + 2*range; 
else if (diff_vector > range-1 ) 

diff_vector - diff_vector - 2*range; 



Table B.6 Range for motion vectors 



fcrward_f_code 


Range in half sample 


or backward_f_code 


units 


1 


32 ■ 


2 


64 


3 


128 



This value is scaled and coded in two parts by concatenating a VLC found from table B.7 and a fixed length 
pan according to the following algorithm: 
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Let f_code be either the forward_f_code or backward_f_code as appropriate, and diff_vector be the 
differential motion vector- reduced to the correct range. 

if (difLvector = 0) { 
residual = 0; 

vlc_xode_magnitude = 0; 

) • 
. else { 

scale_factor = 1 « (fcode - 1); 
residual - (abs(difT_vector) - 1) % scale_factor; 
v!c_code_magnitude = (abs (diff_vector) - residual) / scale_factor; 
if (scale_Jactor != 1) 

vlc_code_magnitude += 1 ; 

1 

vlc_code_magnitude and the sign of diff_vector are encoded according to Table B. 7. The residual is 
encoded as a fixed length code using (f_code-l ) bits. If f_cbde is 1 or if diff_vector is 0 then the residual is 
not coded 



B.1.7 Motion vector data (MVD) (Variable length) 

MVp is included for all INTER macroblocks and consists of a variable length codeword for the horizontal 
component followed by a variable length codeword for the vertical component. Variable length codes are 
given in Table B. 7: 



Table B.7 



VLC table for MVD 



Index 


Vector differences 


Bit number 


Codes 


0 


-16 


13 


0000.0000 0010 1 


1 


-15.5 


13 


000000000011 1 • 


2 


-15 


12 


0000 00000101 


3 


.14.5 


12 


000000000111 


4 


-14 


12 


00000000 1001 


5 


-13.5 


12 


00000000 ION 


6 ■ 


-13 


12 


00000000 II 01 


7 


-12.5 


12 


oooooooo mi 


8 


-12 


11 


0000 0001 001 


9 


-11.5 


11 


0000 0001 on 


10 


-11 


11 


0000 0001 101 


11 


-10.5 


II 


00000001 111 


12 


-10 


11 


00000010001 


13 


. -9.5 


11 


00000010 01 1 


14 


-9 


11 


00000010 101 


IS 


-8.5 


11 


00000010 in 


16 


-8 


11 


0000 001 1 001 


17 


-7.5 


II 


00000011011 


18 


-7 


11 


0000 0011 101 


19 


-6.5 


II 


00000011 111 


. 20 • 


-6 


11 


00000100001 


21 


-5.5 


11 


00000100011 


22 


-5 


10 


00000100 11. 


23 


-4.5 


10 


00000101 01 


24 


■4 


10 


00000101 11 


25 


-3.5 


8 


0000 0111 


26 


-3 . 


• 8 


0000 1001 


27 


-2.5 


8 


0000 101 1 


28 


' -2 


7 


0000 1 1 1 
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29 


-1.5 


5 


0001 1 


30 


-1 


4 


oon 


31 


-0.5 


3 


on 
i 


32 


0 


1 


33 


0.5 


3 


010 


34 


1 


4 


0010 


35 


1.5 


5 


0001 0 


36 


2 


7 


0000110 


37 


2.5 


8 


0000 1010 


38 


3 ■ 


. 8 


0000 1000 


39 


3.5 


8 


00000110 


40 


4 


10 


00000101 10 


41 


4.5 


10 


0000 0101 00 


42 


5 


10 


0000 0100 10 


43 


5.5 


11 


00000100010 


44 


6 


11 


00000100 000 


45 


6.5 


11 


oooooon no 


46 


7 


11 


00000011 100 


47 


7.5 


11 


0000 0011 010 


48 • 


8 


11 


0000 0011 000 


49 


8.5 


11 


0000 0010 110 


50 


9 


11 


0000 0010 100 


51 


9.5 


11 


0000 0010 010 


52 


10 


11 


0000 0010000 


53 


10.5 


11 


00000001 110 


54 


11 


11 


00000001 100 


55 


11.5 


11 


0000 0001 010 


56 


12 


II 


00000001 000 


57 


12.5 


12 


000000001110 


58 


13 


12 


00000000 1100 


59 


13.5 


12 


000000001010 


60 


14 


12 ■ 


00000000 1000 


61 


14.5 


12 


0000 0000 0110 


62 


15 


12 


0000 00000100 


63 


15.5 


13 


0000 0000 001 1 0 


64 | 


16 


n 


oooo oooo oo ion 



B.1.8 Motion vector data (MVD 2 ^) (Variable length) 

The three codewords MVD 2 ^ are included if indicated by VOP_prediction_type and by MCBPC, and 
consist each of a variable length codeword for the horizontal component followed by a variable length 
codeword for the vertical component of each vector. Variable length codes are given in Table B J. 



B.1.9 Macroblock mode for B-blocks (MODB) (Variable length) 

MODB is present only in coded macroblocks belonging to B-VOPs. The meaning of this codeword is same 
as that in H.263. It is a variable length codeword indicating whether MBTYPE and/or CBPB information is 
present. In case MBTYPE dos not exist the default is set to "Direct (H.263 B)".The codewords for MODB 
are defined in Table B. 8. 
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Table B& VLC table for MODS 



Index 


CBPB 


MBTYPE 


Number of 
bits 


Code 


0 






1 


0 


1 




X 


2 


10 


2 


X 


x ; 


2 


11 



Note: V means that the item is present in the macroblock 



B.L10 Macroblock Type (MBTYPE) for Coded B-VOPs (Variable length) 
MBTYPE is present only in coded macroblocks belonging to B-VOPs. Furthermore, it is present only in 
those macroblocks where at least one M VD is sent. MBTYPE indicates the type of macroblock coding . ■ 
used, for example, H.263 like motion compensation or MPEG- 1 like motion compensation with forward, 
backward or interpolated, and change of quantizer if any by use of DQUANT. The codewords for 
MBTYPE are defined in Table B.9. 



Table B.9 MBTYPES and included data elements in coded macroblocks in B-VOPs 



I Index 


MBTYPE 


DQUANT 


MVD, 


mvd> 


MVDB 


Number of 
bits . 


Code 


0 


Direct (H.263 


B) 








X 


1 




1 


Interpolate MC 


+ 0 


X 


X 


X 




2 


01 


. 2 


Backward MC + 


Q 


X 




X 




3 ■ ■ 


001 ■ 


• 


Forward MC + 


Q 


X' 


X 






4 ' 


0001 



Note: "x" means that the item is present in the macroblock 



Rather than refer to each MBTYPE by an index or by its long explanation in terms of MC mode and 
Quantizer information, we refer to them as a coding mode which means the following. 

• Direct Coding (Direct MC, no new Q) 

• Bidirectional Coding (Interpolate MC + Q) 

• Backward Coding (Backward MC + Q) 

• Forward Coding (Forward MC +Q) 

B.L11 Coded block pattern for B-blocks (CBPB) (6 bits) 

CBPB is only present in B-VOPs if indicated by MODB. CBPB H = 1 if any coefficient is present for B- 
block N, else 0, for each bit CBPB N in the coded block pattern. The numbering of blocks has been shown 
earlier, the utmost left bit. of CBPB corresponds to block number 1. When MODB = 0 or I, the default 
value of CBPB is set to 0 which means that no coefficients are sent. 

B.1.12 Quantizer Information for B-Macroblocks (DQUANT) (2 bits) 
The meaning of DQUANT and the codewords employed are the same as that in 1- or P-VOPs. The 
computed quantizer is scaled by a factor depending on the selected global quantizer scale for B-VOP's, 
DBQUANT. . 
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BJJ3 Motion vector data for Forward Prediction (MVD f ) (Variable length) 
MVD r is the motion vector of a macroblock in B-VOP with respect to temporally previous reference VOP 
(an I- or a P-VOP). It consists of a variable length codeword for the horizontal component followed by a 
variable length codeword for the vertical component. The variable length codes employed are the same ones 
as used for MVD and MVD2.4 for P-VOPs. 

B.1J4 Motion vector data for Backward Prediction (MVD b ) (Variable length) 
MVDb is the motion vector of a macroblock in B-VOP with respect to temporally following reference VOP 
(an I- or a P-VQP). It consists of a variable length codeword for the horizontal component followed by a 
variable length codeword for the venial component. The variable length codes employed are the same ones 
as used for MVD and MVD 2 ^ for P-VOPs. 

B.1.15 Motion vector data for Direct Prediction (MVDB) (Variable length) 
MVDB is only present in B -VOPs mode if indicated by MODB and MBTYPE and consists of a variable 
length codeword for the horizontal component followed by a variable length codeword for the vertical 
component of each vector. MVDBs represents delta vectors that are used to correct B-VOP macroblock 
motion vectors which are obtained by scaling P-VOP macroblock motion vectors. The variable length codes 
employed are the same ones as used for MVD and M VD 2 ^ for P-VOPs 

B.2. Block Layer 

A macroblock structure comprises of four luminance blocks and one of each of the two colour difference 
blocks. The same structure is used for all types of VOPs, J, P or B. Presently intra macroblocks are 
supported both in I- and P-VOPs. For such macroblocks, INTRADC is present for every block of each 
macroblock and TCOEF is present if indicated by MCBPC or CBPY. For nonintra macroblocks of P-VOPs 
TCOEF is present if indicated by MCBPC or CBPY. For B-VOP macroblocks, TCOEF is present if 
indicated by CBPB. Figure B.4 shows a generalized block layer for all type of VOPs. 



INTRADC 



TCOEF 



Figure B .4 Structure of block layer 
B.2:i DC coefficient for INTRA blocks (INTRADC) (8 bits) 

A codeword of 8 bits. The code 0000 0000 is not used. The code 1000 0000 is not used, the reconstruction 
level of 1 024 being coded as 1 1 1 1 1111 (see Table B. 1 0). 



Table B.10 

Reconstruction levels for INTRA-mode DC coefficient 



Index 


1 FLC 


Reconstruction level 
into inverse transform 


0 


0000 


0001 


(1) 


8 


1 


0000 


0010 


(2) 


16 


2 


0000 


0011 


(3) 


. 24 


126 


0111 


1111 


(127) 


1016 



82 




127 


1111 


mi 


(255) 


"1024 


128 


1000 


0001 


(129) 


1032 


252 i 


mi 


1101 


(253.) 


2024. 


253 


mi 


1110 


' {254] 


'2032 



B.22 Transform coefficient (TCOEF) (Variable length) 

The most commonly occurring EVENTs arc coded with the variable length codes given in Table B.l 1. The 
last bit "s" denotes the sign of the level, "0" for positive and "1" for negative. 

An EVENT is a combination of a last non-zero coefficient indication (LAST; "0": there are more nonzero 
coefficients in this block, "1": this is the last nonzero coefficient in this block), the number of successive 
zeros preceding the coded coefficient (RUN), and the non-zero value of the coded coefficient (LEVEL). 
The remaining combinations of (LAST, RUN, LEVEL) are coded with a 22 bit word consisting of 7 bits 
ESCAPE, 1 bit LAST, 6 bits RUN and 8 bits LEVEL. Use of this 22-bit. word for encoding the 
combinations listed in Table B.I 1 is not prohibited. For the 8-bit word for LEVEL, the codes 0000 0000 
and 1000 0000 ore not used. The codes for RUN and for LEVEL are given in Table B.J2. 



Table B.l 1 
VLC table for TCOEF 



INDEX 


| LAST 


RUN 


. LEVEL 


1 BITS 


VLC CODE 




INDEX 


| LAST 


RUN 


LEVEL 


| BITS 


VLC CODE 


0 


0 


0 


1 1 3 


Ids 




58 | 1 


0 




Ollls 


1 


0 


0 




5 


lllli 




59 




0 


2 8 10 . 


0000 1100 is 


2 


0 


0 




7 


0101 Oil 




60 




0 


3 I 12 


0000 0000 101s 


3 


0 


0 




8 


0010 Ills 




61 




1 




0011 Ms 


4 


o ■ 


0 




9 


0001 1111 1 




62 




1 


2 | 12 


00000000 100s 


5 


0 


0 




10 


0001 0010 It ■ 




63 




2 




0011 1 Os 


6 


0 


0 




10 


0001 00100s 




64 




3 




0011 Ols 


7 


0 . 


0 




II 


0000 100001s 




65 




4 




.0011 00s 


S 


0 


0 




II 


0000 1000 00s 




« 1 1 


5 




OOlOOIls . 


9 


0 


0 . 


io- . 


12 


00000000 111} 




67 




6 




| S 


0010010s 


10 


0 


0 




12 


0000 0000 1103 




68 . 




7 




8 


ou id ooii 


II 


0 


0 


12 


12 


00000100 000a 




69 




8 




8 


0010 000s 


12 


0 






4 


nos 




70 




9 




9 


0001 1010 1 


13 


0 






7 


0101 00s 




71 




10 




9 


0001 1001 s 


14 


0 






9 


0001 1110* - 




72 




II 




9 


0001 1000 s 


15 ■ 


0 








0000 00 II lis 




73 




12 


1 


9 


0001 01M s 


16 


0 . 






12 


00000100 OOll 




74 




13 




9 


0001 01 10 s 


17 


0 






13 


00000101 OOOQs 




75 




14 




9 


0001 0101s 


18 


0 






5 


IIIOs 




76 




15 




9 


00010100 s 


" 


I 






9 


0001 1101 s 




77 




16 




9 


0001 OOII s 


20 








11 


00000011 10s 




78 




17 • 




10 


0000 1 100 0s 


21 


0 




V » _ 


00000101 0001s 




79 . 




18 




10 


0000 1011 u 


22 


0 






6 


01 10 Is 




SO 




19 




10 


0000 101 1 Os 


23 


.0 


3 




10 


00010001 IS 
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20 ■ 




10 


0000 1010 u 


24 


u 


3 




11 


0000 001 1 Ols 




82 




21 




.0 


UBO 1010 lb 



83 



23 


0 




i 
J 


6 


01 100s 


26 


Q 




i 


10 


0001 0001 Qs 


27 


o 




\ 
J 


13 


iWu\ n4AI fVMA* 

0000 0101 0010s 


28 


n 

V 


J 


1 


6 


0101 h 


29 


Q 




2 


11 


0000 001 1 00s 


-in 


ft 


5 


3 


13 


0000 ft 101 001 Is 


f j 


ft 
U 


6 


1 


7 


0100 1 Is 


.J* 


l ^ 


6 


2 


II 


0000 0010 III 


17 

•>J 


ft 1 

u 


6 


3 


13 


00000101 0100s 


1* 


0 


7 


1 


7 


0100 10s 


■»* 

JJ 


u 


7 


2 


11 


00000010 10s 


JO 


0 


8 


1 


7 


0100 01s 


37 


0 


8 


2 


11 


0000 0010 01 s 


38 


0 


9 


1 


7 


01 00 00s 


39 




9 


2 


II 


OOOOOOlOOQj 


40 


0 


10 


1 


8 


0010 110s 


41 


0 


10 


2 


13 


00000101 0101s 


42 


° 


11 


1 


8 


0010 101s 


'43 


0 


12 


1 


8 


0010 100s 


44 ' 


0, 


13 


1 


9 


0001 IIOOs 


45 


0 


. 14 




9 


0001 1011s 


■ 46 


0 


15 


1 


10 


0001 0000 Is 


47 


0 


16 . 


1 


10 


0001 0000 Qt 


48 . . 


0 


17 


1 


10 


00001111 Is 


49 


0 


18 


1 


to 


0000 mi 0s 


50 


0 


19 


1 


10 


ooopmoi* 


51 


o ■ 


20 


1 


10 


0000 1 110 Qs 


52 


0 


.21 


1 


10 


0000 1 101 Is 


53 


0 


22 




10 


oooo not os 


54 


0 


23 


1 


12 


000001000105 


55 


0 


24 




12 


0000 0100 01 Is 


56 


0 . 


25 




13 


0000 0101 01 Ids 


57 


0 


26 


1 


13 


0000 0101 01 lis 



83 




.22 


1 


10 


0000 1001 Is 


84 


1 


23 


' 


10 


0000 1001 Os 


85 




24 


. 1 


.10 


0000 1000 Is 


86 


1 


25 


1 


It 


00000001 lis 


87 


' 


26 


1 . 


11. 


00000001 10s 


88 


1 


27 


1 


11 


00000001 01s 


89 


1 


28 


1 . 


11 


00000001 00s 


90 


1 


. 29 


1 


12 


00000100100s 


91 


' 


30 


1 


12 


00000100 101s 


92 


1 


31 ■ 


1 ■ 


12 


00000100 110s 


93 


! 


32. 




12 


00000100 Ills 


94 


1 


33 


1 


13 


0000010! 1000s 


95 


1 


34 


■' 


" 


0000 0101 1001* 


96 




35 




13 


0000 0101 lOIOj 


97 




36 




13 


00000101 lOlli 


98 




37 




13 


00000101 1100s 


99 




38 




13 


00000101 1 1015 


100 




39 




13 


00000101 11 10s 


101 




40 




13 


OOOO 0101 lllls 


102 


ESCAPE 


1 


7 


0000011 



Table B.12 
FLC table for RUNS an 



Index 


Run 


Code 


0 


0 


000 000 


1 


1 


000 001 


2 


2 


000 010 


: 63 | 


63 


111 111 



LEVELS 



Index 


Level 


Code 




-128 


FORBIDDEN 


0 


-127 


1000 0001 


125 


-2 


1111 1110 


126 


-1' 


1111 1111 


i 


o ■ 


FORBIDDEN 


127 ' 


1 


0000 0001 


128 


2 


oooo" 0010 



84 




85 



Appendix B 
Core Experiments 



The following is a summary list of core experiments. Core experiments have been divided into 9 classes. 
For more information, please refer to the appropriate documents as listed below. 



Table 1 List of Core Experiments 



No. 


Core Experiment 




Prediction flSOAEC JTC1/SC29/WG1 1/N1 25(» 


PI 


Core Experiment on Global Motion Compensation 


P2 


Core Experiments of Block-Partitioning for Motion Prediaion 


P3 


Core Experiments of STFM/LTFM Memory for Motion Prediction 


P4 


Motion Segmentation and Compensation for Improved Coding Efficiency 


P5 


Comparison of Entropy Constrained Variable Block Size Motion Estii 
Compensation 


P6 


2D Triangle Mesh Based MC Prediction 


P7 


Core Experiment on New Block ME/MC 


P8 


Core Experiment on Motion and Aliasing Compensating Prediction 




Frame Texture Codine flSO/IEC .TTC1/SC29/WG11/N125(» 


J ML - 


wavelet coding oi I And r Pictures 


T3 


Matching Pursuits Coding of Prediction Errors 


T4 


3D-DCT Coding of B-Pictures 


T5 


Vector Wavelet Coding of I-Pictures 


T6 


Vector Wavelet Coding of P-Pictures 


17 


Core Experiment Description of Modulated Lapped Transform Enhanceme 
Coding 


T8 


Core Experiment on Variable-Size Lapped Transforms Coding of 1 & P Pictures 


T9/10 


Core Experiment on Improved Intra Coding 


Til 


Adaptive Transform/Quadtree Template Coding of Intra Macroblocks 


T12 


Residue suppression for frame-based DPCM Coding: Coding of Human Color 
Perception. 




Quantization and Rate Control iISO/IEC rTCl/SC29/WGliyN 12501 


Q2 


Improved Rate Control 


Q3 


Analysis of Arithmetic Coding for the MPEG4 video VM 
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Shape and Alpha Channel Coding aSO/IEC 1TC1/SC29/WG11/N123(» 


S) 


Comparison of Gray Scale Shape Coding Techniques 


S2 


Geometrical Transformation and Representation of Sprites 


S3 


Variable Block Size Segmentation 


S4 


Comparison of Shape coding Techniques 


S5 • 


Shap Adaptive Region Partitioning Method for Arbitrary Shaped VOP 




Obiect/Rerion Texture Codine rtSO/IEC .rTCl/SC29/YVGll/N1225> 


01 


Comparison of coding techniques for Sprite objects 


02 


DCT coding of macroblocks padded using Alpha-channel 


03 


Wavelet/Subband coding of Region Texture 


04 

SB 


Shape Adaptive DCT for coding of Region Texture 


V-AJ 




07 


Mean Replacement DCT coding of Region Texture 


08 


E/l (Extension Interpolation) DCT for Region Texture Coding 


09 


Coding of Arbitrarily Shaped Textured Image Segments 




Error Resilient Core Experiments flSO/IEC JTC1/SC29/WG11/N1224) 


El 


Error Resilient Core Experiments on Resynchronizauon Techniques 




cum ncoiiiciii K~\jyi c n.Apcriiiiciiu> un iiicnirLiiittii oirut-iuici 


E3 


Core Experiment on Error Resilient Tools 


E4 


Core Experiment on Error Resilient Methods Based on Back Channel Signaling and FEC 


E5 


lYiri* rTnprimpnl nn Frmr Pnnrpiilpmpfit *T"pj"*hriJniir*c 

V Ul W IdiA^I U DVilt irfl IUI I^^Ci 1 V't 1 i\rt II J wVlnllUU&a 




Ranriwirtth anH fnmrlpvilv Qrtilino flch/TRr ITCIKnOQ/WCltl /NllfJti ' 


D 1 


vjencruMzeo temporal -dpaiiai ocaiaoie ixxiing . 


CI 


Content based temporal scalability 




Multi-vie^, Model Manipulation and 




fJSQ/iEC JTCI/SC29AVGI wri?W 


Ml 


Mismatch Oorrectfcd Stere<VMii1tiview Ovlinp 


M2 


5-0 Trinnolp Mp^h for Ohiftrl/fVintpnl Mnninnlntion 




Prp. Mtri. ntirl pMf.nr/uvcdno f ISO/fir C" TTCI/^P? 


Nl 


Comparison of Coding Noise Removal Techniques 


N2 


Comparison of Automatic Segmentation Techniques 


N3 


Authomatic on line generation of Sprites 
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